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Preface to Second Edition 


The author has been very gratified with the reception of Modern Factor Analysis 
over the past seven years. During this time he listened intently to reviewers, corres¬ 
pondents, and colleagues. This revised edition takes cognizance of such suggestions 
and criticisms—improving, it is hoped, the presentation of the original material— 
and also introduces much new material. 

Since the first edition of this text was published, factor analysis has made consider¬ 
able progress in several directions. Continued research has led to new techniques for 
coping with long-standing problems in factor analysis as well as for pushing into areas 
previously unexplored. Largely responsible were the advances made in electronic 
computers and their associated programs. Not only have the computers suggested 
new avenues of research, but because of their greater availability and at reduced 
costs, factor analysis has been applied in many new areas and by many new workers. 
All this is reflected in the revised edition, but always with emphasis on the methodo¬ 
logical aspects rather than the applications. 

The organization of the text remains unchanged—consisting of five major parts 
covering the basic foundation of factor analysis, direct solutions, derived solutions, 
factor measurements, and problem material. Within this structure, however, many 
changes have been introduced. Thus, the original chapter 17 has become the new 
chapter 10 (properly placed in part ii); the old chapters 7 and 8 have been consolidated 
into new chapter 7; the old chapter 10 has been condensed and has become a section 
of the new chapter 8; while entirely new material will be found in chapter 9 and 
sections 8 . 2 , 8 . 8 , 15 . 5 , and 16 . 3 . In addition, many other sections of the book were 
changed to bring them in line with present-day theory and practice. The net result is 
that the new edition, although totally revised, is about the same length as the first 
edition. 

Important changes were introduced as a direct result of the impact of computers: 
certain classical methods of factor analysis are now obsolete. For example, the 
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centroid method, for all its historical interest and importance, is a technique that has 
given way to the more mathematically sound principal-factor method made possible 
by the modern-day computer. By the same token, hand-computing techniques such as 
those covered in sections 3.3 and 3.4 assume less importance in the present computer 
era, but their study has considerable heuristic value. In general, the author still 
favors an eclectic approach, but more attention is paid to the present-day relevance 
of the various methods and recommendations are made among the several alter¬ 
natives. 

The author has drawn upon many published researches to enhance the text, and 
the acknowledgments made in the preface of the first edition are implied once again. 
In addition, I especially want to extend my thanks to Professor Robert I. Jennrich 
and my son, Alvin J. Harman, both of whom suggested a more consistent use of 
matrix notation and critically reviewed portions of the revised manuscript. Also, 
Dr. Jennrich made available to me draft copies of his paper and computer program, 
upon which the direct oblimin method of section 15.5 is based. 

In the attempt to keep up to date on the new developments in factor analysis, 
I have been aided materially by the stimulating and provocative discussions of 
the Factor Analysis Work Group, sponsored in part by the U.S. Office of Naval 
Research, during 1962-66. The regular members at the semiannual meetings of this 
group included L. R. Tucker, chairman, R. Bargmann, H. H. Harman, C. W. Harris, 
P. Horst, L. G. Humphreys, H. F. Kaiser, W. Meredith, and C. Wrigley, and visitors 
at specific meetings included L. Guttman, K. Joreskog, and A. Madansky. 

Once again, I am pleased to express my gratitude to Margot von Mendelssohn 
for her continued assistance in updating the bibliography and in the preparation of 
the manuscript for the revised edition. 

Harry H. Harman 

Princeton, New Jersey 





Preface to First Edition 


Modem Factor Analysis reflects the progress made in the nineteen years since my 
publication with Professor Karl. J. Holzinger of Factor Analysis, and includes the 
advances that have taken place in the computing art. Many new concepts and pro¬ 
cedures have been developed in these years while some unwieldy methods are now 
obsolete. 

A revised edition of Holzinger and Harman’s Factor Analysis might have appeared 
several years ago were it not for the untimely death of Dr. Holzinger in 1954. We were 
just laying the groundwork for the revision of our original work, and were antici¬ 
pating the actual start in the late summer of that year, following his appointment as 
visiting professor at the University of California at Berkeley. In addition to the deep 
personal loss, Dr. Holzinger’s death ended a professional partnership of twenty years 
which the writer held in highest esteem. 

In the light of the vast and broad advances that have occurred in factor analysis 
due largely to the advent of electronic computers, a revision alone would have been 
insufficient to include the new material covered in the present text. Among the 
more important features of Modern Factor Analysis are the following: (1) the treat¬ 
ment of “simple structure” concepts and methods, usually associated with the 
Thurstone school offactor analysis; (2) the introduction of analytical methods of 
rotation to desired final solutions; (3) the use of high-speed electronic computers 
in factor analysis; (4) the use of the square root method for desk calculator operations; 
(5) the introduction of statistical tests of hypotheses in factor analysis; (6) the pre¬ 
sentation of problems and exercises, and answers to go with them; and (7) the very 
extensive, pertinent bibliography on the theory and methods of factor analysis. 
This book is intended to serve as a reference treatise on factor analysis in the current 
stage of advancement of the subject. Furthermore, it is hoped that its utility as a 
text book will be enhanced by the many problems and exercises, as well as the com¬ 
puting algorithms and summaries of concepts and notation. 
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The text is organized into five major parts, covering the foundation of factor 
analysis, direct solutions, derived solutions, some special topics, and problem 
material. The six chapters of part I provide the historical background and basic 
notions of factor analysis, introducing the fundamental mathematics necessary for 
a proper understanding of the subject, and concluding with an enumeration and 
general description of the principal forms of factor solutions. Part II develops 
several of these solutions directly from the observed data (the correlations among 
the variables). This is accomplished in five chapters where computing procedures 
and illustrative examples are presented as well as the theoretical developments. 
The notion of derived solutions is introduced in part III by considering the relation¬ 
ships among the arbitrary choices of factor solutions. Then follows an elaboration 
of the simple structure principles, the distinction between primary and reference 
coordinate systems, and the analytical methods of arriving at either orthogonal or 
oblique multiple-factor solutions from some arbitrary initial solution. In part IV 
two broad special topics are considered. While most of the text is concerned with the 
resolution of the observed variables in terms of the factors, here a chapter is devoted 
in the inverse problem of measuring the factors in terms of the variables. Some very 
effective computing procedures are presented for this purpose. The final chapter is 
concerned with formal statistical tests in factor analysis, in contradistinction to the 
implicit mathematical basis assumed in standard factor analysis methods. 

Following the text proper there are several additional items. In part V a large 
group of problems and exercises not only provides useful material for classroom 
use, but is intended to support and supplement the formal presentation. Of the 
statistical tables presented in the Appendix, some are standard and are provided 
simply for the convenience of the reader while some are unique to factor analysis. 
The Bibliography has been carefully culled from the vast literature to include only 
those specific papers and treatises that are relevant to the theory and methods of 
factor analysis. A myriad of applications has purposefully been excluded. Finally, 
a very detailed Index is included. 

Throughout the text the theoretical developments are elucidated both by the 
detailed computing procedures and the numerical illustrations. The particular con¬ 
tent area of these data has no special significance since they are employed merely 
to exemplify the techniques. For this reason, many of the numerical examples that 
were first employed by Holzinger and Harman, and have since become classics in 
the literature, are used in Modern Factor Analysis to illustrate the new techniques 
and make possible comparisons with the old. 

While factor analysis was created and developed largely by psychologists, its 
usefulness as a statistical tool has much broader implications. It is in the latter sense 
that the subject is presented in this text. Psychological data are used for illustrative 
purposes, but no attempt is made to formulate new psychological theories. This 
book is intended to provide the student with the basis for a clear understanding of 
the concepts and techniques of factor analysis. He is then in a position to employ 
factor analysis as a statistical tool in developing theories in psychology or other 
disciplines. 
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Mathematical language is employed in order to present a precise, unambiguous 
picture—not for the sake of mere elegance, nor even for rigorous proof. Some proofs 
are given, some roughly indicated, some omitted entirely. Primarily, the work is an 
exposition and not a formal mathematical development. However, the mathematical 
language represents careful selection and is precise. It is the author’s firm belief that 
a little effort on the part of the reader to grasp and to follow the notation will yield 
immeasurable rewards in understanding the subject. 

The author gratefully renders due homage to the many individuals whose published 
researches have been drawn upon for the enrichment of these pages. I wish it were 
possible in every instance to assign credit where it is due, but in any event, I want 
to avoid claiming credit to myself for any development first made by another. It is 
a particular pleasure for me to acknowledge my debt to both Professor Holzinger and 
Professor Thurstone for my early development in the field of factor analysis. 

Of course many of the author’s thoughts on the foundation and basic mode of 
presenting the factor problem were incorporated in his joint work with Dr. Holzinger. 
He is indebted to the University of Chicago Press for permission to use portions of 
such work which first appeared in Holzinger and Harman, Factor Analysis. 

Permission from John Wiley and Sons, Inc., to reprint material from the author’s 
chapter, Factor Analysis,” in Mathematical Methods for Digital Computers, edited 
by Anthony Ralston and Herbert S. Wilf, is gratefully acknowledged. I am indebted 
to Professor Sir Ronald A. Fisher, F.R.S., Cambridge, and to Dr. Frank Yates, F.R.S., 
Rothamsted, also to Messrs. Oliver & Boyd Ltd., Edinburgh, for permission to 
reprint Table D in the Appendix, taken from their book Statistical Tables for 
Biological , Agricultural and Medical Research. 

Portions of the manuscript have been read, and helpful suggestions made, by 
Dr. Edith S. Jay and Dr. John M. Leiman. Invaluable assistance was provided by 
Mr. Leonard W. Staugas in the application of electronic computers to the solution 
of many problems in the text. The drafting of the illustrations is largely due to 
Mr. Toshio Odano. Mr. Wayne H. Jones offered critical assistance and wise counsel 
on many occasions during the final preparation of this book, and was especially 
helpful in regard to the final draft of chapter 17. All of the foregoing are my colleagues 
at the System Development Corporation, and to each one I acknowledge indebted¬ 
ness and deep appreciation. 

The author had the benefit of consultation with Professor John B. Carroll, Harvard 
University, Professor Henry F. Kaiser, University of Illinois, and Dr. David R. 
Saunders, Educational Testing Service, in regard to the analytical methods for the 
multiple-factor solution. In addition, Professor Kaiser gave me a critical appraisal 
of the first drafts of chapters 14 and 15, and arranged for the calculation of several 
complex problems on the Illiac. Professor Carroll also read these chapters and pro¬ 
vided me with the computer programs and his draft of the oblimin methods as soon 
as they were available in late 1958. It is a pleasure to acknowledge appreciation for 
this generous assistance. 

Dr. Ardie Lubin, Walter Reed Army Institute of Research, provided stimulation 
and encouragement at the initiation of the work, and had a major part in preparing 
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the first draft of chapter 17. While I may not have written the text along the lines of 
compact matrix algebra and precise statistical terms as Dr. Lubin would have liked 
to see it, nonetheless I am grateful for the many incisive discussions with him. 

My sincere thanks are due Mr. Harry A. Liflf for suggesting that the present work 
be undertaken and for his constant encouragement during the last two years of pre¬ 
paration of the manuscript. 

Finally, I owe a special debt of gratitude to Margot von Mendelssohn for her 
tireless assistance in all phases of the book’s preparation. 

H.H.H. 

Pacific Palisades , California 
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The text is organized to assist the reader in quickly locating any topic or displayed 
material. All items are tied to the chapter number. The primary subdivisions of 
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period, and serial number of section within chapter, e.g., 10.4 for the fourth section 
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to a table or figure, the identifying word is always employed; but since equations are 
distinguished by parentheses, frequently the equation number is given without the 
words “equation” or “formula”. References to bibliographical entries are enclosed 
in square brackets, e.g., Gibson [150]. 

Definitions and symbols are introduced as necessary to clarify the presentation. 
For convenience, the locations of the principal sources of notation are indicated 
in the following table. 
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FOUNDATIONS OF FACTOR ANALYSIS 






1 

Introduction 


1 . 1 . Brief History of Factor Analysis 

Factor analysis is a branch of statistical science, but because of its development 
and extensive use in psychology the technique itself is often mistakenly considered 
as psychological theory. The method came into being specifically to provide 
mathematical models for the explanation of psychological theories ofhuman ability 
and behavior. Among the more famous of such theories are those proposed by 
Spearman, Burt, Kelley, Thurstone, Holzinger, and Thomson. 

The birth of factor analysis is generally ascribed to Charles Spearman. His monu¬ 
mental work in developing a psychological theory involving a single general factor 
and a number of specific factors goes back to 1904 when his paper “General In¬ 
telligence, Objectively Determined and Measured” was published in the American 
Journal of Psychology. Of course, his 1904 investigation was only the beginning of 
his work in developing the Two-Factor Theory, and his early work is not explicitly 
in terms of “factors.” Perhaps a more crucial article, certainly insofar as the statistical 
aspects are concerned, is the 1901 paper by Karl Pearson [386] in which he sets forth 
“the method of principal axes.” Nevertheless, Spearman, who devoted the remaining 
forty years of his life to the development of factor analysis, is regarded as the father 
of the subject. 

A considerable amount of work on the psychological theories and’ mathematical 
foundations of factor analysis followed in the next twenty years. The principal 
contributors during this period included Charles Spearman, Cyril Burt, Karl Pearson, 
Godfrey H. Thomson, J. C. Maxwell Garnett, and Karl Holzinger; the topics 
receiving the greatest attention were attempts to prove or disprove the existence of 
general ability, the study of sampling errors of tetrad differences, and computational 
methods for a single general factor which included the fundamental formula of the 
centroid solution. 


3 





1.1 FO UN DA TIONS OF FA CTOR ANAL YSIS 


The early modern period, including the bulk of the active and published controversy 
on factor analysis, came after 1925, with a real spurt of activity in the 1930 s. By this 
time it had become quite apparent that Spearman’s Two-Factor Theory was not 
always adequate to describe a battery of psychological tests. So group factors found 
their way into factor analysis; although the experimenters, at first, were very reluctant 
to admit such deviation from the basic theory and restricted the group factors to as 
small a number as possible. What actually happened was that the theory of a general 
and specific factors in Spearman’s original form was superseded by theories of many 
group factors, but the early method continued to be employed to determine these 
many factors. Then it naturally followed that some workers explored the possibility 
of extracting several factors directly from a matrix of correlations among tests, and 
thus arose the concept of multiple-factor analysis in the work of Garnett [140]. 

While the actual term may be attributed to L. L. Thurstone, and while he un¬ 
doubtedly has done most to popularize the method of multiple-factor analysis, he 
certainly was not the first to take exception to Spearman’s Two-Factor Theory 
and was not the first to develop a theory of many factors. It is not even the centroid 
method of analysis (see sec. 8 . 9 ) for which Thurstone deserves a place of prominence 
in factor analysis. The centroid method is clearly admitted by Thurstone to be a 
computational compromise for the principal-factor solution. The truly remarkable 
contribution of Thurstone was the generalization of Spearman’s tetrad-difference 
criterion to the rank of the correlation matrix as the basis for determining the number 
of common factors (see 4.10 and chap. 5). He saw that a zero tetrad-difference 
corresponded to the vanishing of a second-order determinant, and extended this 
notion to the vanishing of higher order determinants as the condition for more than 
a single factor. The matrix formulation of the problem has greatly facilitated further 
advances in factor analysis. 

The mathematical techniques inherent in factor analysis certainly are not limited 1 
to psychological applications. The principal concern of factor analysis is the resolution 
of a set of variables linearly in terms of (usually) a small number of categories or 
“factors”. This resolution can be accomplished by the analysis of the correlations ; 
among the variables. A satisfactory solution will yield factors which convey all the 
essential information of the original set of variables. Thus, the chief aim is to attain, 
scientific parsimony or economy of description. * 

As will become evident in the course of this text, a given matrix of correlations can 
be factored in an infinite number of ways. (It is not entirely clear whether this well- 
known fact was truly appreciated in the earlier days of factor analysis; and if, in fact, 
the failure to recognize this mathematical truism may not have been the cause of 
the many controversies regarding the “true”, the “best”, or the “invariant” solution 
for a set of variables.) When an infinite number of equally accurate solutions are 
available, the question arises: How shallchoice be made among these possibilities? 
The preferred types of factor solutions are determined on the basis of two general 
principles: (1) statistical simplicity, and (2) psychological meaningfulness (if the 
content area is psychology). In turn, each of these requires interpretation, and each 
has been applied variously to yield several distinct schools of factor analysts. 
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If one were to make his choice entirely upon statistical considerations, a rather 
natural approach would be to represent the original set of variables in terms of a 
number of factors, determined in sequence so that at each successive stage the factor 
would account for a maximum of the variance. This statistically optimal solution— 
me th°d of p r i ncip al axes discussed in chapter 8—was first proposed by Pearson 
at the turn of the century, and in the 1930’s Hotelling provided the full development 
of the method. While this procedure is perfectly straightforward, it entails a very 
considerable amount of computation, and becomes impractical with ordinary 
computing facilities when the matrix is of order 10 or greater. In recent years, how¬ 
ever, this difficulty has been overcome by the use of high-speed electronic computers. 

Another choice based upon statistical considerations is the qentrojd solution. As 
indicated above, this method was introduced only as a computational expedient 
when it became apparent that the principal-factor solution was too, laborious. All 
that can be said for the centroid method is that it produces without much arithmetic 
one of many possible sets of axes which account for the variance in a manner approxi¬ 
mating the optimal situation of the principal axes. 

The end product of these solutions—principal or centroid—generally is not 
acceptable to psychologists (although Burt sometimes prefers the principal-factor 
solution). In quest of “meaningful” factor solutions, psychologists have introduced 
various theories in the hope of arriving at a form of solution which would be unique 
and apply equally well to intelligence, personality, physical measurements, and any 
other variables with which they might be concerned. Holzinger’s Bi-Factor Theory 
and Thurstone’s Simpje__Sl^ctHi^„'nieory are in this class. On Iheotherliand 
Thomson’s Sampling Theory ([465], chaps. 3 and 20) is primarily a psychological 
theory of the mind. There is no preferred type of factor solution obtainable uniquely 
on grounds of psychological significance (see 6.2). If psychological meaningfulness 
rather than a mathematical standard is imposed, then the judgment of the investigator 
must be involved. The recent progress toward objective solutions to this problem is 
presented in chapters 14 and 15. 

As pointed out above, a principal objective of factor analysis is to attain a parsi¬ 
monious description of observed data. This aim should not be construed to mean 
that factor analysis necessarily attempts to discover the “fundamental” or “basic” 
categories in a given field of investigation such as psychology. It might be desirable 
to base such an analysis upon a set of variables which measures all possible mental 
aspects of a given population as completely and accurately as possible. Even in 
such a case, however, the factors would not be completely fundamental because of 
the omission of important measures which were not yet devised. While the goal 
of complete description cannot be reached theoretically, it may be approached 
practically in a limited field of investigation where a relatively small number of 
variables is considered exhaustive. In all cases, however, factor analysis does give a 
simple interpretation of a given body of data and thus affords a fundamental descrip¬ 
tion of the particular set of variables analyzed. 

The essential purpose of factor analysis has been well expressed by Kelley [306, 
p. 120]: There is no search for timeless, spaceless, populationless truth in factor 
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analysis; rather, it represents a simple, straightforward problem of description in 
several dimensions of a definite group functioning in definite manners, and he who 
assumes to read more remote verities into the factorial outcome is certainly doomed 
to disappointment.” 

1.2. Applications of Factor Analysis 

The application of factor-analysis techniques has been chiefly in the field of 
psychology. This limitation has no foundation other than the fact that it had its 
origin in psychology and that accounts of the subject have tended to be ... so bound 
up with the psychological conception of mental factors that an ordinary statistician 
has difficulty in seeing it in a proper setting in relation to the general body of statistical 
method” [312, p. 60]. One objective of this book is to correct this situation. 

The methods of factor analysis may lead to some theory suggested by the form of 
the solution, and conversely one may formulate a theory and verify it by an appro¬ 
priate form of factorial solution. The latter approach is illustrated by Spearman’s 
theory “ all branches of intellectual activity have in common one fundamental function 
(or group of functions) whereas the remaining or specific elements of the activity seem 
in every case to be wholly different from that in all others” [438, p. 202]. He showed 
that if certain relationships (the tetrads defined in 5.3) exist among the correlations, 
all the variables can be resolved into linear ex pressions involving onlyjme general 
factor and an additional factor unique to each variable. These relationships furnish 
the^adstic^verfficatioia of the “Two-FactoFThTory.” If a set of psychological 
variables yields correlation coefficients which do not satisfy the preceding relation¬ 
ships, then a more complex theory may be postulated. This may require several 
common factors in the statistical description of the variables. 

One of the earliest proposals for broadening the psychological uses of factor 
analysis was made in 1940 by Truman L. Kelley [307], when he gave a method for 
attaining the greatest social utility while at the same time preserving individual 
liberties and rights. Then during World War II, with the large-scale testing, classi¬ 
fication, and assignment problems, factor analysis was employed widely throughout 
the several branches of the military services of the United States. Psychologists, of 
course, have continued to develop and exploit the technique to the present day. 

Many psychologists have engaged in extensive testing programs, employing factor 
analysis to determine a relatively small number of tests to describe the human mind 
as completely as possible. The usual approach includes the factor analysis of a large 
battery of tests in order to identify a few common factors. Then the tests which best 
measure these factors, or, preferably, revised tests based upon these, may be selected 
as direct measures of the “factors of mind.” However, only to the extent to which 
psychologists agree that the tests selected are the “right tests” can they be said to be 
actual measures of the factors. Such “factor tests” should be of a pure nature, 
differing widely from one another so as to cover the entire range of mental activity. 
Several major studies have been undertaken to identify factors from large sets of 
tests. Among the early studies of this type are the Spearman-Holzinger [234] unitary 
trait investigation and Thurstone’s [472] primary mental abilities study. More 
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recent studies concerned with the isolation of specific psychological factors are very 
numerous, e.g., J. P. Guilford [ 167J, Raymond B. Cattell [76j, and John French [135]. 

Almost as numerous are the applications to fields of psychology other than 
intelligence. Again, by way of illustration, can be included: a study of tempera¬ 
ment [169]; an investigation of executive morale [425]; uses in clinical therapy [346, 
361]; analysis of perceptual bases of speaker identity [502]; and determination of 
the voting behavior of justices of the Supreme Court [424,480]. 

In recent years there has been an ever increasing application of factor analysis to 
fields other than psychology—fields as varied as sociology and meteorology, political 
science and medicine, geography and business. While it is not the interff of this book 
to go into any detail on the many content applications of factor analysis, it is fitting 
to note some of the newer areas where it is being used. A sampling from about 200 
such applications into a dozen distinct fields discloses some of the exciting uses of 
factor analysis. The following are illustrative of these applied studies: 

International relations [10,410,411,453]; 

Urbanization and economic development [37,38,48,154,157,163,220, 393,421 ]; 

Sociology [78,170,350,389]; 

Economics [13,124, 310, 315, 316, 532]; 

Man-machine systems [412,481]; 

Accident research [165, 500]; 

Communications [509, 510]; 

Taxonomy, from such varied fields as entomology and classification of publica¬ 
tions [45,46,406,407,431,434]; 

Biology [433, 533]; 

Physiology and medicine [15,247,282,382,384,390], with special reference to 
cardio-vascular diseases [66,67, 82,92,419]; 

Geology [273, 317]; 

Meteorology [17,517]. 

From the foregoing brief enumeration of some of the studies employing factor 
analysis, it can be seen that many investigators have come to appreciate the power 
and benefits to be reaped from this multivariate statisti cal techn ique. 

The applications of factor analysis'lndicated above are concerned primarily with 
classification and verification of scientific hypotheses in the particular field of in¬ 
vestigation. Quite a different use—regarded by some to be the chief value of factor 
analysis—is to supplement, and perhaps simplify, conventional statistical techniques 
and computations. Typical of this kind of application is the use of factor analysis in 
expediting the computation of multiple regression statistics (see e.g., [108], [90]), 
the savings being especially noticeable when the number of variables is large and 
the number of factors small. This comes about from the fact that the task of inverting 
a correlation matrix is reduced essentially from the order of the number of variables 
to the order of the number of factors (see chap. 16). In the problem of studying the 
relationship between two sets of variables factor analysis can again be employed. 
The best linear function of the variables in each set is obtained by factorial methods, 
and then the correlation between these composites gives what is known as the 
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canonical correlation [262]. Such a correlation is the maximum possible between the 
two sets of variables. 

While the present text is concerned primarily with the exposition of various pro¬ 
cedures in factor analysis, numerous examples are employed for illustrative purposes. 
Some examples are hypothetical, but most of them are taken from psychology and 
the social sciences. These examples serve to clarify the theoretical treatment only, 
not to exhibit the practical usefulness of factor analysis. 

1 . 3 . Scientific Explanation and Choice 

Factor analysis, like all statistics, is a branch of applied mathematics. Thus, it is 
used as a tool in the empirical sciences. In dealing with observed data, of course, there 
are inherent discrepancies. One of the objectives of statistical theory is to provide a 
scientific law, or mathematical model, to explain the underlying behavior of the 
data. Some simple examples include: (1) a linear regression for the prediction of 
school success from three entrance examinations; (2) a mathematical curve, such as 
the normal distribution or one of the Pearson family of curves, for the explanation 
of an observed frequency distribution; (3) a chi-square test of significance for the 
independence of such classifications as “treated or not treated with a certain serum,” 
and “cured or not cured”. Such laws make allowance for random variations of the 
observed data from the theoretically expected values. It is conceivable that any one 
of several, quite different, mathematical models may provide an equally good fit or 
explanation of a set of data. 

In general, science is concerned with the establishment of laws regarding the behav¬ 
ior of empirical events or elements in the particular field. Such laws—usually 
expressed as mathematical functions—serve to relate the knowledge about the known 
elements and to provide reliable predictions about future elements. The standard 
procedure in developing a scientific theory involves the formulation of a mathemat¬ 
ical law on the basis of some observations (with discrepancies), followed by the verifica¬ 
tion of the particular mathematical model with new observations. For any problem 
in an applied science there may be a number of mathematical theories which explain 
the phenomena in a satisfactory manner. A misunderstanding of the relationship 
between a mathematical model and observed data is frequently encountered. When 
a theory has been successfully employed in describing a set of data, there is a tendency 
to accept this law as the only correct one for describing the observations. 

Furthermore, it is sometimes inferred that nature behaves in precisely the way which the 
mathematics indicates. As a matter of fact, nature never does behave in this way, and there 
are always more mathematical theories than one whose results depart from a given set of data 
by less than the errors of observation. 


The danger is always when a theory has been found to be convenient and effective over a 
long period of time, that people begin to think that nature herself behaves precisely in the 
way which is indicated by the theory. This is never the case, and the belief that it is so may 
close our minds to other possible theories and be a serious impedence to progress in the 
development of our interpretations of the world around us [41, pp. 472,477, author’s italics]. 
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The above observations apply equally well in different fields. One of the simplest 
cases arises in the problem of surveying a small tract of land. For this purpose either 
of two mathematical theories, plane or spherical trigonometry, may be applied. Thus, 
in surveying a city lot the result by either theory would be equally satisfactory, and 
the engineer would prefer the plane theory because of its greater simplicity. In this 
instance, however, there is no doubt as to the greater accuracy of the spherical theory 
since the earth is essentially spherical. 

In the field of astronomy there are two common theories describing the solar 
system. The Ptolemaic and Copernican theories, with suitable modification of the 
former, describe the motions of the planets with equal accuracy. “There is really no 
advantage for either of these theories as compared with the other, as far as their 
adaptability to explain numerically the facts of the solar system is concerned. The 
Copernican theory is, however, much the simpler geometrically and mathematically. 
For this reason it has been adapted and developed until astronomers can predict 
coming celestial events with most surprising accuracy” [41, pp. 477-8]. 

Even the subject of geometry, which might seem to depend on a unique mathemat¬ 
ical theory, can be described by means of many different theories. Thus, the physical 
configurations in a plane can be interpreted in the light of Euclidean geometry, 
Riemannian geometry, or various other types of non-Euclidean geometry. There¬ 
fore, the applied science of geometry can have several alternative theories as its basis. 

As in the foregoing illustrations there are different models, or forms of solution, 
which may arise in the factorial analysis of a particular set of data. The usefulness of 
factor analysis as a scientific tool has been questioned by some workers because of 
this indeterminacy. It should be evident, however, that this is tantamount to indicting 
all applied sciences because they do not depend upon unique theories. 

Since the beginning of this century, psychologists and statisticans have developed 
several types of factorial solutions. The proponent of each system of analysis has 
urged its suitability for the interpretation of psychological data. The strong feelings 
and emotions that characterized one period in the development of factor analysis have 
been described with both sarcasm and wit by Cureton [91, p. 287]: 

Factor theory may be defined as a mathematical rationalization. A factor-analyst is an 
individual with a peculiar obsession regarding the nature of mental ability or personality. 

By the application of higher mathematics to wishful thinking, he always proves that his 
original fixed idea or compulsion was right or necessary. In the process he usually proves 
that all other factor-analysts are dangerously insane, and that the only salvation for them 
is to undergo his own brand of analysis in order that the true essence of their several maladies 
may be discovered. Since they never submit to this indignity, he classes them all as hopeless 
cases, and searches about for some branch of mathematics which none of them is likely to 
have studied in order to prove that their incurability is not only necessary but also sufficient. 

The heated and inspired controversies about the “best” method of factor analysis 
are over—Charles Spearman (1863-1945), L. L. Thurstone (1887-1955), Karl J. 
Holzinger (1893-1954) have all passed from the scene. It is not a personal controversy 
that is implied here, but rather the strong conviction of each individual who had 
devoted a major part of his life to the development of a particular school of thought in 

9 



1.3 FO UN DA TIONS OF FA CTOR ANAL YSIS 


factor analysis. The many papers that appeared during the thirties and forties urging 
“this method” rather than “that method” had their place in the growth of the subject. 
However, with a fuller understanding of the salient features of each method, and with 
the increased efficiency of computations, the differences among the various methods 
no longer loom so ominously, and the followers of a particular approach are much 
more tolerant of the adherents of an alternative scheme. 

It should be evident that the different types of factorial solutions correspond to 
the different mathematical theories in the description of a particular scientific prob¬ 
lem. Several preferred forms of solution are enumerated (see chap. 6), before the 
detailed presentation of the factor analysis theory and computing procedures. From 
such statements of the salient features of the preferred solutions, the researcher may 
weigh the advantages and limitations of any particular type of solution for his 
particular data. 

It is the sincere hope of the author that the present book will provide a better 
understanding of apparently competing methods of factor analysis. By bringing the 
several methods under proper focus, an unbiased decision regarding the appropriate¬ 
ness of any one of them should be possible. 








2 

Factor Analysis Model 


2 . 1 . Introduction ' 

As noted in the preceding section, opeof the first tasks confronting the researcher 
who is concerned with the analysis of a body of observed data is the formulation of 
statistical model. Sometimes this fundamental**® is over¬ 
looked or only tacitly implied. In any event, a particular model must be acknowledged 
\ : ^ ny lnferenc f t0 be made about the observed data. Of course, there are manv 
^renUnodel^ depending on the purpose of the anaiysiTlecause 

of the inherent desire of scientists to explain observed phenomena in terms of elegant 
(i.e„ simple) theories, and because the mathematical development might otherwise 

avrnn^tinr^m^ ^ S '~^^ueiitly assumed. That is a basic 

assumption made throughout thlT^L The generalizations that might ensue from 

P^Tand Wood '[SAT] C ° nStramt are mdicated b X Bartlett [31, pp. 32-34], McDonald 

The fundamenta 1 statistical concepts and notation required in factor analysis are 
set forth in 2 . 2 . This is followed by the linear models employed in factor analysis 
The variance composition of a variable, attributable to the different types of factors 
postulated in the model, is presented in 2 . 4 . Again, some of the basic terms are 
e meated and clearly defined. The factor problem is then presented in 2 . 5 , where 
the elements to be determined by the analysis are identified. Then the degree of fit of 
the factor model to the observed data is considered in 2.6, and the indeterminacy 
of factor solutions is explained in 2 . 7 . Finally, in 2.8 many of the preceding concepts 
are put in much more compact form by the use of matrix notation. 

2.2. Basic Statistics 

A statistical study typically involves a group of individuals with some common 
attributes. The term “individual” is used here in a generic sense to stand for such 
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objects or entities as persons, census tracts, businesses, etc. Measurements made on 
such individuals, or attributes of these entities, are designated simply as mrmh cs. 
Throughout this text the letter N is used for the total number of individuals, and the 
letter n for the number of variables. A particular variable is denoted by X J} or simply 
/ which may be any one of the n variables: 1 , 2, 3, ..., n. The index i is employed to 
designate any one of the N individuals: 1,2,3,..., N. Then, the value of a vanab e 
X . for individual i is represented by , the order of the subscripts being o impor 
tance. A particular X n is called an observed value which is measured from an arbitrary 

origin and by an arbitrary unit. . 

It is customary in statistical literature to employ Greek letters to denote population 
parameters (and hypothetical or latent variables) and corresponding Latin letters to 
indicate simple observations of these variables. Also, to indicate a sample estimate 
of a population parameter, the latter symbol is modified by placing a caret ( ) over 
it. To the extent that it is convenient, and does not violate long established practice 
in factor analysis, such notation will be followed in this text. 

Some simple statistical concepts are presented here for ready reference. Frequen 
use is made of the sum of the N values for a variable X Jt namely: 

(2.1) I*/i or 

i = 1 

where, in the second expression, the summation is understood to be over all values 
of the variable. This convention for the summation with respect to the number ot 
observations of a particular variable will be observed throughout the text. Further¬ 
more, the index i will be reserved explicitly to refer to the individuals, that is, for the 

range 1 to N. . . , , 

For a sample of N observations, the mean of any variable is defined by 

( 2 . 2 ) X j = YX ji /N. 

The n population means would be designated pj if they were required in any theoret¬ 
ical development. The observed values of the variables may be transformed to more 
convenient form by fixing the origin and the unit of measurement. When the origin 
is placed at the sample mean, a particular value 

_ Y — Y. ' ; 

(2.3) x ji X ji j X-m . ^ 

is called a deviate. 1 

The sample variance* of variable Xj is defined by 

(2.4) X 2 = 14/ N - 

The population variances are denoted by a). Now, taking the sample standard devia¬ 
tion as the unit of measurement, the standardized value of variable; for individual i 

* It should be noted that the sample value of the variance is a biased estimate of the population 
variance from a normal distribution, but multiplication by N/{N - 1) makes it unbiased. 
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is given by 

(2.5) Zji = XjJsj. 

The set of all values z j7 (i = 1,2, ...,N) is called a variable Zj in standard form. 
Obviously the variance of Zj is unity. 

For any two variables j and k the sample covariance is defined by 

( 16 ) s jk = 'Z x ji x ki/N, 

and the corresponding population parameter is designated a jk . The correlation 
coefficient in the universe is p jk and in the sample is defined by 

( 2J ) r Jk = Sjk/sfk = Z ZjiZjN = X x jiXki /yi xl £ x 2 ki . 

T^^arxgjmions among all the vari ables of a s tudy are usually computed., as the 
initial step in a factor analy sis. 

In order to provide concrete illustrations of some of the basic statistics, a very 
simple numerical example is now introduced. The same example will be used through¬ 
out the text to demonstrate the various features of factor analysis. Only N = 12 
individuals and n = 5 variables are considered in order to bring out all aspects of 
factor analysis, while at the same time keeping the computation at a minimum. While 
artificial data might have been contrived to yield exact mathematical solutions, it 
was deemed more advisable to use objective, fallible data, even though most of the 
standards reg arding experimental des ign, sampling , and_.rejjability are obviously 
ignored. SoTwhile the data are “real,” the results are not intended to have any real 
substantive value but merely to illustrate the methods, and perhaps to provide a 
convenient numerical problem for checking of computer programs. 

With this understanding, the data in Table 2.1 were taken (not entirely arbitrarily) 
from a study of the Los Angeles Standard Metropolitan Statistical Area. The twelve 
individuals used in the example are census tracts—small areal subdivisions of Los 
Angeles. A full-scale factor analysis of 67 variables (in percentage form rather than 
actual values to allow for greater comparability) and 1,169 census tracts is reported 
by Burns and Harman [48]. The study was designed to include groups of variables 
involving population , e mploy ment, i ncome , and ho using characteristics, and these 
are represented in the little example. From the raw dataTthe correlations among the 
five variables were computed as the initial step toward subsequent factor analysis 
work. These correlations are shown in Table 2.2.* While the calculation of the 
correlations is singled out as a separate step, it is done so only for ease of presenta¬ 
tion. Nowadays, when large electronic computers are quite readily available for 
factor analysis work, the determination of the correlations is a trivial step in the 
computer process going from the raw data to the final solution. As this example is 
used again and again in the text, the thread will be picked up from the correlations 
to the next phase and subsequent ones. 

* The only reason for showing five decimal places (as output from an IBM 7094) is to provide 
a means for checking numerical calculations. 
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Table 2.1 

Raw Data for Five Socio-Economic Variables 
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Individual 
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Median 
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Misc. 
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(Tract No.) 
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Popula- 

School 

Employ- 

Professional 

Value 
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Years 
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Services 

House 


i~ i 

3- 2 
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y 4 

5 

JL- 1 (1439) 

5,700 

12.8 

2,500 

'270 

25,000 

jl, “ 2 (2078) 

1,000 

10.9 

600 

10 

10,000 

3 (2408) 

3,400 

8.8 

1,000 

10 

9,000 

4(2621) 

3,800 

13.6 

1,700 

140 

25,000 

5(7007) 

4,000 

12.8 

1,600 

140 

25,000 

6(5312) 

8,200 

8.3 

2,600 

60 

12,000 

7 (6032) 

1,200 

11.4 

400 

10 

16,000 

8 (6206) 

9,100 

11.5 

3,300 

60 

14,000 

9 (4037) 

9,900 

12.5 

3,400 

180 

18,000 

10 (4605) 

9,600 

13.7 

3,600 

390 

25,000 

■ 11 (5323) 

9,600 

9.6 

3,300 

80 

12,000 

m 12 (5416)' N 

9,400 

11.4 

4,000 | 

A 1 

13,000 

EVt 

Mean 2 

Xd*A242 

3,440 


X* 2,333 

V - 121 

17,000 

Standard deviation 

/^M.8 

1,241 

Jl -115 

6,368 
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Correlations among Five Socio-Economic Variables 
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1. Total population 

1.00000 

.00975 

.97245 

.43887 

.02241 

2. Median school years 

.00975 
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.15428 

.69141 

.86307 

3. Total employment 

.97245 

.15428 
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.51472 

.12193 

4. Misc. profess, services 

.43887 

.69141 

.51472 
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2 . 3 . Linear Models 

It is the object of factor analysis to represent a variable Zj in terms of several under¬ 
lying factors, or hypothetical constructs. The simplest mathematical model for 
describing a variable in terms of several others is a linear one, and that is the form 
of representation employed here. However, there are still several alternatives within 
the linear framework, depending on the objective of the analysis. A distinction between 
two objectives can be made immediately, namely: (1) to extract the maximum 
variance; and (2) to “best” reproduce the observed correlations. 
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An empirical method for the reduction of a large body of data so that a maximum 
ot the variance is extracted was first proposed by Karl Pearson [386] and fully 
developed as the method of principal components, or component analysis, by Harold 
Hotelling [259]. The model for component analysis is simply: 

^ Z J = a n F i + a J2 F 2 + • • • + a jn F n (j = 1, 2, • • •, n ), 

where each of the n observed variables is described linearly in terms of n new uncor¬ 
related components F u F 2 , • • •, F n . An important property of this method, insofar as 
the summarization of data is concerned, is that each component, in turn, makes a 
maximum contribution to the sum of the variances of the n variables. For a practical 
problem only a few components may be retained, especially if they account for a 
arge percentage of the total variance. However, all the components are required t6 
reproduce the correlations among the variables. 

In contrast to the maximum variance approach, the classical factor analysis model 
is designed to maximally reproduce the correlations (various methods for accomplish¬ 
ing this are the subject of a large portion of this book). The basic factor analysis 
model may be put in the form : 

Z J = a n F i + a j2 F 2 + • • • + a jm F m + djUj (j =1,2 , • • •, n ), 

where each of the n observed variables is described linearly in terms of m (usually 
much smaller than n) common factors and a unique factor. The common factors 
account for the correlations among the variables, while each unique factor accounts 
for the remaining variance (including error) of that variable. The coefficients of the 
factors are frequently referred to as “loadings.” 

To call attention to the fact that the expression (2.9) is a mathematical model of 
the observed variable, it is sometimes designated by zf For simplicity the prime will 
usually be omitted in the theoretical expression for the variable. Furthermore, if one 
wanted to be precise, as regards statistical notation, the symbols for the factors and 
their coefficients should be Greek letters since they are in the nature of population 
parameters and can only be inferred from the observed data. However, the notation 
of (2.9) is so well established in the factor analysis literature* that it was deemed 
wiser not to change it. Of course, the a’s and Fs are generally used symbols and imply 
no relationships between the two models (2.8) and (2.9). 

The classical factor analysis model (2.9) may be written explicitly for the value of 
variable j for individual / as follows: 


(210) z #~Y. Oift + djVj, (i = l,2,---,iV;/ = 1,2,- 

P= 1 

In this expression F pi is the value of a common factor p for an individual i, and each 
of the m terms a jp F pi represents the contribution of the corresponding factor to the 
mear composite, while djU is the “residual error” in the theoretical representation 
of the observed measurement z j{ . 

* The only exception occurs in the mathematical-statistical contributions to factor analysis. 
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Without any loss of generality, it can be assumed that the F’s and C/’s have zero 
means and unit variances, since they are unknown in practice. Furthermore, the n 
unique factors are supposed to be independent of one another and also independent 
of the m common factors. In model (2.9) the F’s are considered statistical variates or 
random variables, defined by a probability density function which, for certain pur¬ 
poses, is taken to be normal. When the factors are assumed to be independent normal 
variates, it follows that the z’s have a multivariate normal distribution (discussed 
further in chap. 10). 

While the statement of the factor analysis model in (2.10) makes explicit the values 
of the factors, they must actually be estimated indirectly, as explained in chapter 16. 
There is another model which overtly is like (2.10), but with the important difference 
that the factor values F pi are treated as parameters to be estimated directly, along 
with the factor loadings. It appears to have certain desirable features, but also many 
unresolved problems. This model, which is investigated by Anderson and Rubin 
[14], Whittle [519], and Joreskog [285] will not be developed in this text. 

The model of primary interest is (2.9), with the basic problem of factor analysis 
being the estimation of the nm loadings of the common factors. Various methods are 
available for accomplishing this and are developed in detail in the chapters following 
the foundations set forth in part i. Throughout this text, the variables are standardized 
and the correlations among them are employed ip the calculations of the factor 
loadings. Some attempts have been made to estimate the factor loadings directly from 
the observed values of the variables rather than their covariances or correlations 
(see, for example, [236], [414], [519], [545]), but such procedures are not considered 
here. It should be apparent that the factor analysis model bears a strong resemblance 
to that of regression analysis insofar as a variable is described as a linear combina¬ 
tion of another set of variables plus a residual. However, in regression analysis the 
set of independent variables are observable while in factor analysis they are hypo¬ 
thetical constructs which can only be estimated from the observed data. Finally, the 
factors themselves are estimated in a subsequent stage of the analysis (see chap. 16). 

For most of the exposition in this text, no assumptions are made about the statistical 
distributions of the variables. More precisely, the correlations among the variables 
for a given sample are treated as if they were the true correlations in the population, 
ignoring statistical variation. Alternatively, various procedures are developed which 
operate on the correlations among a set of variables to produce solutions in the 
sense of model (2.9), accepting these correlations as mathematical rather than statis¬ 
tical entities. However, when questions of statistical inference arise—regarding the 
number of common factors or the significance of factor loadings then specific 
assumptions on the distribution functions of the factors and the observed variables 
are introduced (see, chaps. 9 and 10). 

2.4. Composition of Variance 

The variance of a variable may be expressed in terms of the factors according to 
the model of the preceding section. Thus, applying the definition (2.4) to the model 





FA CTOR ANAL YSIS MODEL 2.4 


(2.10) of Zj yields: 


S J = Z 4/N = z 4(Z f UnJ + d> z uf,/N 

m 

+ w&wn) + *,1 «*(z wjv). 

where it will be remembered, the sums without an indicated index are on i from i to 

' • ° W ’ f. m< f * he variance Of a variable in standard form is equal to unity and all 
variables (including the factors) are assumed to be in standard form for any sarnnk 
the last equation may be written any Sample ’ 

m m 

(2.U) S f - 1 = £ aj p + $ +2 WfpF , + 2 dj Z VPpUr 

The unique factors are always uncorrelated with the common factors and if the 
r p h fi °e n sfo t0rS are UnCOrrdated * am0ng them —. <»en 7^“ 


s ? - 1 _ a?! + aj 2 + ■ ■ ■ + aj m + df. 


The terms on the right represent the portions of the unit variance of z ascribe hi* 
the respective factor. For example, a] 2 is the contributionof^the factor he 

^defined to bl ^ C ° Mribution ° f a factOT F r to the variances of all the variables 


F r= Z 


(p = 1, 2, • • •, m), 


:i“i::„T tion of a11 the common factors to the totai ^ ° f a » 


( 2 . 14 ) v = z k- 

p= i 

factorialysis. “ S ° me ‘ imeS emP ‘° yed “ a " indiCat ° r of the completeness of the 

From the composition of the total unit variance as expressed by (2 12) two imDor 
tant concepts in factor analysis follow: (1) the community of a variable ‘ which is 
gtven by the sum of the squares of the common-factor coefficient J ’ 


~ a n + ajz + • • • + aj„ 


(j = 1, 2, • • •, n ), 


and (2) the uniqueness, which is the contribution of the unique factor The latter 
■indicates the extent to which the common factors fail to account for the to,a un 
variance of the variable. Sometimes it is convenient to separate the unique^ 7a 

* The case of correlated factors is treated in detail in chaps. 13 and 15. 
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variable into two portions of variance—that due to the particular selection o 
variables in the study and that due to unreliability of measurement. If additiona 
variables were added to a given set, their correlations with the original variables migh 
necessitate the postulation of further common factors. However, any such potential 
linkages of a given variable can only be expressed as a portion of its uniqueness m 
the study with the original set of variables. This portion of the uniqueness is termed 
the specificit y of the variable. The remaining portion, due to imperfections of measure¬ 
ment is the error varkmce or unreliability of the variable. The complement of the 
error'variance is sometimes called the “reliability” of the variable. In psychology, this 
systematic component of the variable (as distinguished from error components) is 
usually measured by the correlation between two separate administrations of the 
same test, or parallel forms of the test. Two such representations of a variable may 
be denoted Zj and z„ and their correlation r t , is then the measure of the reliability ot 
th.6 v^riQ-blc 

When the unique factors are decomposed into the two types described above, the 
linear model (2.9) for any variable may be written in the form 

(2.16) zj = a jl F l + a j2 F 2 + • • • + a jm F m + bjSj + ejEj (j — h 2, , n ). 

Here S- and F are the specific and error factors, respectively, and bj and e } their 
coefficients. Since the specific and error factors are uncorrelated, the following rela¬ 
tionship between the uniqueness and its component parts is immediately found. 


(2.17) 


dj = bj + ej. 


Hence, the total variance may be expressed in the following alternative way: 


(2.18) 


1 = h) + dj = hj + b) + 


Thus, the total variance of a variable may be said to be made up of the communahty 
(attributable to the Fs) and the uniqueness; or, alternatively, the total variance is 
made up of the communahty, the specificity (attributable to the S s), and the un¬ 
reliability (attributable to the Fs). 

By factorial methods the communahty hj and the uniqueness dj of each variab 
in a set are obtained. The uniqueness of each variable may then be split into the 
specificity and unreliability, but this is independent of the factorial solution If the 
reliability rjJ of a variable Zj is known (it may be obtained by experimental methods), 
then the error variance may be obtained by means of the equation 

ej = l- r p . 

Then, knowing the unreliability, the specificity follows from (2.17), viz., 
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where the uniqueness dj is known from the factorial solution. Since the reliability is 
the complement of the unreliability ej, it follows from (2.18) that 

r JJ = h j + b j 

and hence: 

( 2 - 19 ) hj = rjj - bj g rjj. 

In other words, the communality of any variable is less than or equal to the reliability 
of the variable, and equals the reliability only when the specificity vanishes. 

Employing the model (2.16) for the expression of a variable Zj in terms of factors, 
the composition of variance of such a variable (in decreasing order of magnitude’ 
generally) is given by: 

Total variance (1) = hj + bj + ej = hj + dj. 

Reliability ( rjJ ) = hj + bj = 1 - ej, 

(2.20) Communality (hj) = hj = 1 - dj, 

Uniqueness (dj) = bj + ej = 1 - hj, 

Specificity (bj) = bj = dj - ej, 

Error variance (ej) = ej = 1 — r j7 . 

An index of completeness of factorization may be expressed in the form: 

(2.21) Cj = 100 hj/(hj + bj) = 100 communality/reliability. 

This index shows the percentage of the reliability variance of a variable accounted 
for by the common factors. The index Cj is always less than 100 and approaches 100 
only when the specificity bj vanishes. Obviously the analysis for determining the 
coefficients a jp should not be carried to the point where no specificity is present when 
dealing with a finite set of variables. 

Some workers may not care to assume specific or even error factors as indicated 
by (2.16). Then the factors Sj and Ej are not postulated, and the number of common 
factors m may be less than, or equal to, the number of variables n. However, the 
hypothesis of factors indicated by (2.16) appears tenable even for variables which 
appear to describe a set of objects very completely and with great precision. 

2.5. Factor Patterns and Structures 

Having described the compbsition of variables in terms of factors, it is now pos¬ 
sible to consider in broad outline the objectives of a factor analysis of data. For a set 
of n variables, the linear model (2.9) expressing any variable Zj in terms of m common 
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factors and its unique factor may be written in a slightly expanded form as follows: 

Z 1 = 12^2 + • • • + «1 nFm + ^ 1^1 

z 2 ~ a 2\F\ + a 22^2 + - • 1 + a 2n^m 

( 2 . 22 ) 

Z n = a n \F ^ "|~ a n2 F 2 "b ’ ’ ' "b ^ nm^m 

Such a set of equations is called a factor pattern, or more briefly, pattern. For 
simplicity a pattern may be presented in tabular form in which only the coefficients 
of the factors are listed, with the factor designations at the head of the columns. 
Sometimes a table including only the coefficients of the common factors may be 
referred to as the pattern. In a pattern (2.22), the common factors F p (p — 1,2, • • •, m) 
may be either correlated or uncorrelated, but the unique factors Uj(J — 2, • • •, n) 

are always assumed to be uncorrelated among themselves and with all common 
factors. In the linear description of a particular variable, the actual number of com¬ 
mon factors involved may be less than m, some of the coefficients being zero. The 
number of common factors involved in the description of a variable is called its 
complexity. 

Factor analysis yields not only patterns but-also correlations between the variables 
and the factors. A table of such correlations is called & factor structure, or merely a 
structure. Both a structure and a pattern are necessary in order to furnish a complete 
solution. The functional relationships between the elements of a structure and the 
coefficients of a pattern will now be shown. 

Multiplying any one of equations (2.22) by the respective factors, summing over 
the number of observations N, and dividing by N, produces 

r ZjFl = a n + a j2 r FlF2 + • • • + a jp r FlFp + • • • + a jm r FlFm , 


(2.23) r ZjFp = a n r FpFi + a j2 r FpF2 +■■■ + a jp + • • • + a^r^, 


r zjF m = a ji r FmF i + a j2 r FmF2 + • • • + a jp r FmFp + ■■■ + a jm , 

and 

(2.24) r tjVj = d r 

Equation (2.24) shows that the correlations with the unique factors (the elements 
r z . v . of a factor structure) are always identical with the coefficients of the unique 
factors in the pattern. When no confusion can arise, the table of correlations of 
variables with common factors only, i.e., the table of r ZjFp , will be referred to as the 
factor structure. 


+ d 2 U 2 

+ d n U n . 
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While it might appear that equations (2.23) are to be used to evaluate the structure 
elements, more frequently these equations are used to obtain the values of the pattern 
coefficients when the correlations between variables and factors and the correlations 
among the factors themselves are known. Formally, (2.23) may be considered as n 
sets of m linear equations in the unknown a jp (j = 1, •••,„; p = 1 , • • :, m ), with the 
left-hand members as known quantities. It is then possible to solve these systems of 
equations for the unknown coefficients a jp . Computing procedures for such formal 
solutions are developed in chapter 3 and applied in chapter 13, while the entire part ii 
is devoted to the direct analyses for the factor patterns. 

From equations (2.23) it is apparent that the elements r zF of a structure are 
generally different from the coefficients a jp of a pattern. In case the common factors 
F p are uncorrelated, that is, r FpFq = 0 (p # q), then equations (2.23) reduce to 

^ 2 ' 25 ^ = a jp U= 1,2,••.,«; p = 1, 2, •. •, m). 

Thus, only in the case of uncorrelated factors are the elements of a structure identical 
with the corresponding coefficients of a pattern. In an analysis involving uncorrelated 
factors, a complete solution is furnished merely by a factor pattern inasmuch as the 
correlations of the variables with the factors are given by the respective coefficients. 

As already indicated, both structure and pattern should be produced in making a 
complete factor analysis. The structure reveals the correlations of variables and 
factors, which are useful for the identification of factors and for subsequent estimates 
of the latter (see chap. 16). The pattern shows the linear composition of variables in 
terms of factors in the form of regression equations. It may also be used for repro¬ 
ducing the correlations between variables to determine the adequacy of the solution, 
which is discussed in 2.6. In comparing different systems of factors for a given set 
of variables, again patterns are useful. 


2.6. Statistical Fit of the Factor Model 

In the preceding sections a model was developed as the mathematical theory 
underlying the observed data. This model—the set of equations (2.22)—makes the 
assumption that the variables are composed linearly of the factors. In what sense does 
a set of factors explain the relationships among the variables? That is the subject of 
the present section. 

The observed correlations among the variables constitute the primary data. What 
happens to the correlation between two variables which are approximated by the 
linear model? If, in general, such correlations derived from the factor model are little 
different from the observed correlations, the model is said to fit the empirical data 
well; otherwise, the approximation is poor and the hypothesis should be altered. 
The correlation between two variables may be reproduced from the factor pattern 
(2.22) by the following procedure: multiply any two such equations, sum over the 
number of individuals N, and divide by N. Remembering that the factors are in 
standard form, the correlation between variables Zj and z k (/, k = 1, 2, • • •, n) can be 
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expressed as follows: 

' r'j k = a jX a kX + a j2 a k2 + a j 3 a &3 + • • • + a jm a km 

+ («ji %2 + a kl a j2 )r FiF2 + • • • + (a jX a km + a kX a jm )r FlFm 
(2.26) \ + (a j2 a k3 + a k2 a j2 )r F2Fi + • • • + (a j2 a km + a k2 a jtn )r F2Fm 

+ • • • + a jX d k r FlUk + a kl djr FlU . + • • • + a jm d k r FmUk + a km df FmUj 

+ djd k r UjUk , 


where the reproduced correlation (computed from the pattern) is written r' jk to 
distinguish it from the observed correlation r jk . This distinction is made throughout 
the text. 

The unique factors have been assumed to be uncorrelated with the common factors 
and among themselves, hence r FpU . = r FpUk = r UjUk = 0. If the common factors are 
uncorrelated, equation (2.26) simplifies still further. The correlations r FpFq (p ^q; 
p,q = 1,2, • • •, m) are then zero, and everything below the first line of the equation 
vanishes. For the case of uncorrelated common factors, the correlation between any 
two variables is reproduced from the factor pattern by an equation of the following 
form: 

(2.27) r' jk = a jX a kX + a j2 a k2 + • • • + a jm a km (j # k; j, k = 1,2, • • •, n ). 


This expression is merely the sum of the products of corresponding pattern coeffi¬ 
cients of the two variables correlated. Of course, the self correlation of a variable is 
unity. The factor analysis model preserves this property through the unique factor 
for each variable. Thus, the reproduced self correlation for variable j can be obtained 
from (2.26) by setting k = j, and, again assuming uncorrelated factors, it can readily 
be seen that its value is simply the sum of the communality and the uniqueness of the 


variable. 

Now that the distinction between the observed and the reproduced correlations 
has been made, what should be the extent of their agreement? The correlations re¬ 
produced by the factor pattern, as given most generally in (2.26) or for the case of 
uncorrelated factors in (2.27), should not agree exactly with the observed correlations 
because allowance must be made for sampling and experimental errors. It is a com¬ 
monly accepted scientific principle that a theoretical law should be simpler than the 
observed data upon which it is based, and hence discrepancies between the law and 
the data are to be expected. In the case of factor analysis, functions (the correlations 
r) fc ) of the assumed linear composition of variables should be expected to vary some¬ 
what from the observed values. 

After a factor pattern has been obtained, its adequacy as a description of the 
variables is determined from the extent to which it explains the correlations among 
the variables. This is done by forming the reproduced correlations from the pattern 
and subtracting them from the corresponding observed correlations. The resulting 
differences are known as residual correlations, and are defined by 
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where r jk is the observed correlation an a > • i 

pattern. In case the common factors are mcoTrehZ^' 0 " ™^ 0 ^ 0 ^ from the 
the fom: are “correlated, the residuals then reduce to 

(2.29) f = , _/ 

Th • 11 !k (aj ' akl + ai2l>k2 +■•■ + a jm a k J. 

The question then arises as to how nearly the , , ■ 

factor pattern should fit the observed ones rre atIOns repr< ? duced from a 

size and distribution of the residuals f The Ji ? re f ment ma y be judged by the 
course, be approximately zero. When'all comm™ r^ ° f ‘ he residuaIs should, of 
forming the residuals, then no further link* u factors have bee n removed in 
therefore, be expected that the distribution of r^ •^ >et ! veen tables exist. It might, 
fro correlation in a sample of equal s“ze The Ind 7° Uld * SimiIar to that of a 
non is given by the formula ' standard error of such a zero correla- 

. _ °r = 0 = 1 /y/N~^l, 

(230) Sm ce ^ K USUa ' ,y large ' 3 S ‘ andard f ° r j “ dging ad « of fit may be taken to be • 

° f 7 siduala This standard is a crude 
would suggest that the size of resiZlf mti d P ’7^ the briefcst reflection 
especially the number of variables Cpend also on other characteristics 

^ the 

concluded that there are further significant grea ‘ er than V*. il may be 

t.on of the form of solution is 

it would appear that unjustified linkages hen r °” sldera % Jess than 1/ViV 
the solution. When the standard^ inc,ad * * 
zero correlation, the solution may be regarded a Sldaa s Is J ust below that of a 

fZ \ TUS Statistical tes ‘ was recoXn^d 7 C Kell ta 5os ^ Iigh ‘ ° f the abo ™ 

a v 4 ” 

2.7. Indeterminacy of Factor Solutions 

Of ways^which^are^mutu^ly^con^tent 61 ^^ 6 ^^^ * deSCTibed in a great variety 
must then depend upon its utility This arbitral ° f * partlcular interpretation 
been cognized by philosophers^^sdence for ” 7 ° r “ dete ~y> which has 
F- R. Moulton [373, p. 484]: on ^ tlme > ls P ut succinctly by 

infinitely man] wl^ f‘"P reted consistently in various ways in fact in 

discussed m chaps. 8, 9, and 10. P ° nng has not been developed. This problem is 
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would remember that and their beliefs 

that , given the cor- 

The factor problem, likewise is f f tor pattern are not uniquely 

relations of a set of va™bto^1heco^n« " ated P factors may be chosen, 
determined That is, systems of orthogonal, rf ways xhis property has 

consistent with the observe cone a ton , f the time diey had become 

pp. 112-13]. ode l_ t he factor loadings a jp are not unique 

In essence, the indeterminacy m the h m _dimensional space containing 

—arises from the fact * h f.*“ basis or frame of reference or the 
the common factors, but it does not de “™ occurs at tw0 stages: in getting 

exact position of these fa ct ° rs - mode | in a statistical sense, and in getting 

an (arbitrary) solution which sati interDre tation. In general the computational 

the solution in a form most amena loadings, an exception being the 

methods do no, yield ^olutL may be put in a “canonical 

principal-factor solution (c p. second stage still exists (see part iii). After 

form” (see 8.8). The data (to within a given degree 

a factor solution has been found that ? ther so i ut i 0 n (fitting the data 

*» — ors in a part,cular fielA 

2.8. The Factor Model in Matrix Notat, ° n analysis it is desirable to set forth 
Before leaving the general “ ve ™ e * °"^ ac t matrix notation. The advantages 
the ideas developed earlier in t 1SC ^P certain properties and proofs can be 

- -*■ — s of factor 
analysis are developed by the use of factor analysis will be expressed 
J^alSSies ofa study may be designated by a column vector, 

as follows: 


(2.31) 


z = 


■Zil 


Z2 


LZ n 


. the reader who wishes a review of the basic concepts of matrix algebra may wan, to turn to 
chap. 3 before reading this section. 
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2.8 


while the complete set of N values for 

the n x N matrix: e n vana ^ es will be represented by 


(2.32) 


'U 


'21 


'12 


'22 


'IN 


"2N 


Jnl Z n2 ... Zni 

Similarly, the factors may be represented as follows: 


(2.33) 


>i 

F m 

u„ 


u = 


~ F u f 12 

F m i F m 2 
’^ii U l2 


IN 


u 


IN 


J^nl U n2 

The coefficients of the factors in a factor 


U„ 


following matrix : 


pattern may be represented by the 


(2.34) 


M 


a n *i 2 ••• a lm d t 0 ... o 

a 2 l 0,22 ••• a 


2m 0 d 2 • • • 0 


L a nl a , 


n 2 


0 o 


(AID), 


“ P ° f ^ A <*—-factor 

matrix A is referred to as C ° effidents - Usua11 ^ 

A are to be considered as coefficients in an oblim 6 ”?^!’ ^ elements a Jp of the matrix 
upon correlated factors. In the course of « f q e-factor P atte rn, i.e, a pattern based 

orthogonal pattern A and a (final) obHan? 0 °L ana » S1S ’ sometimes both, an (initial) 
are two distinct pattern ^ 

distinguishing notation. ontext, then it is necessary to use 

Employing the foregoing definitions, the factor pattern (2.22) may be written. 

} 2 = M{f|u} = (A|D) ff|u} = Af + Du, 

or, for the common-factor portion alone • 

(236) z = Af. 
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j tn Hesi anate the variables since the context would 
No new notation is used to g common-factor portion was involved, 

it perfectly clear whether the complete 0 * —^ ;. e „ a tab le of 

In addition to a pattern, a factor a y lattef being identical with the 

correlations of the vanables with the fa ° r ( * ated) Q f course, the correlations 
pattern only if the common , ac are identical with the unique factor co- 

of the variables w '* th ( 2 24 )Tnd f the diagonal matrix of these values has been 

lb. — «» — " l ” 

the factor structure may be represented by. 


(2.38) 


S = 


S 11 

S 12 •• 

• s lm 

S 21 

S 22 

• s 2m 

Jnl 

S„ 2 • 

' ‘ S n m_ 


The notation for the elements of a factor ^^ ^rwilTLke it perfectly clear 
definition (2.6) of a sample covariance. The context win 

-'Siis-. p««- ■"* ■ “ $■ 

:r„: 

namely, 

(2.39) Z = AF 

• ■ Kxr tVif 1 transnose of the matrix ol 

and nostmultiply both sides of this expression by the transp 

factor values and divide by the scalar N to obtain. 

(2 40) ZF /N = A(FF/N). 

The left-hand member of this equation simplifies as follows: 

(2.41) ZF/N = S, 

i iirrnrdine to the basic definition (2.7). 

- trr S: zz 

fmo^r —n factors ,s defined by 

1 r FiF 2 


(2.42) 


0 > = FF/N = 


r F 2Fl 


1 


r Fl F r 

r F 2 F„ 


L r FmFi r F m F 2 


... 1 
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Upon substituting (2.41) and (2.42) into (2.40), the latter expression reduces to: 

(2.43) S = A0>. 

This is the fundamental relationship between a factor pattern A and a factor structure 
S—the structure matrix is equal to the pattern matrix postmultiplied by the matrix 
of correlations among the factors. From this relationship it is clear that if the factors 
are uncorrelated (i.e., <l> is an identity matrix), the elements of the structure are equal 
to the corresponding elements of the pattern. The explicit expression for the pattern 
matrix can be obtained from (2.43) by postmultiplying both sides by <J> _1 . The 
result is 

(2.44) A = S<J> -1 . 

It is possible to express the matrix R f of reproduced correlations (with com- 
munalities in the diagonal) in several alternative forms, employing the foregoing 
relationship between a pattern and a structure. By definition, the matrix of observed 
correlations is given by 

(2.45) R = ZZ'/iV. 

If (2.39) is substituted into this equation, and the observed correlation matrix is 
replaced by the matrix of reproduced correlations, there results :* 

(2.46) Rf = AFF'A'/N = A(FF'/iV)A' = A0>A', 

where the last equality follows from (2.42). 

Sometimes it is desirable to consider the reproduced correlation matrix with ones 
in the diagonal, i.e., adding the uniqueness dj to the communality hj of each variable. 
This matrix is represented by (R f + D 2 ), and equation (2.46) becomes: 

(2.47) (R’ + D 2 ) = m|® °|m', 

in which the composite square matrix (of order m + n) includes the identity matrix 
(of order n) of correlations among the unique factors as well as the correlation matrix 
(of order m) among the common factors. 

From the relationship between a pattern and a structure, alternative formulas 
can be derived which obviate the explicit use of the matrix <l>. Thus, substituting 
(2.43) into (2.46) produces 

(2.48) Rf = SA', 


* In replacing observed by computed correlations, the tacit assumption is made that the 
residuals vanish. To avoid additional symbolism, R is employed sometimes for both matrices, 
but it should be clear when it is computed from the observed variables and when it is computed 
from the factor pattern. 
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or 

(2.49) R f = AS', 
since 0> is symmetric. 

Of course, when the factors are uncorrelated, the matrix <l> reduces to an identity 
matrix and formula (2.46) simplifies to the following expression for the reproduced 
correlations: 

(2.50) R f = AA'. 

This equation has been called “the fundamental factor theorem” by Thurstone 
[468, p. 70]. 

An important observation can be made at this point. The factor problem is con¬ 
cerned with fitting a set of data (the observed correlations) with a model—the factor 
pattern (2.22), or (2.35) in matrix notation. Under the assumption of such a pattern, 
the correlations are reproduced by means of the common-factor coefficients alone, 
as may be seen from (2.50).* For the reproduced correlation matrix R f to be an appro¬ 
priate fit to the observed correlation matrix R, the diagonal elements must also be 
reproduced from the common-factor portion of the pattern. Thus, if numbers approxi¬ 
mating the communalities are put in the diagonal of the matrix of observed corre¬ 
lations, the factor solution will involve both common and unique factors and the 
foregoing formulas will appropriately reproduce all the elements comparable to the 
observed data. On the other hand, if unities are placed in the principal diagonal of 
the observed correlation matrix then the factor solution must necessarily involve 
only common factors in order for equation (2.50) to reproduce the unities. In this 
case no provision is made for unique factors, implying the component analysis model 
(2.8). If such values between communalities and unities as the reliabilities are em¬ 
ployed, then the factor solution would involve common and error factors, but the 
specificity would be included in the “common-factor” variance. From these con¬ 
siderations it should be clear that the values put in the diagonal of the observed correla¬ 
tion matrix determine what portions of the unit variances are factored into common 
factors. 


* In the discussion of this paragraph, there is no loss in generality to assume that the factors 
are uncorrelated. 
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Matrix Concepts Essential to Factor Analysis 


j.i. introduction 

TTr P :°" ded the impetus '<* develop- 
In modern mathematical thinkins it U ^1°™ *n 6 Iatlei SrCW the theor > of matrices, 
precedes that of determinant” Further^? ^ “ the idea of matrix 

Of the concept of Mu! TZ™* ^ importance 

systems of equations certainlv ran h- i a ’ determin ants are quite useful, 

Of the mathematical“of males' T°7 ^ A ' S °’ the bulk 
determinants. Nonetheless because of hktn • f veloped wlthout the explicit use of 
the basic concepts »l b-n established, 
m 3.2. enumerated before matrices and introduced 

systems 3, ofHnrareqra^ons'WWktWs^Tab™!'™ T' ^ formulated in ter ms of 
high-speed digital ^ the advent of 

Of solving a system of linear equations and the d aspecial impetus to the problem 
eigenvalues of a matrix (see chan %) rn a associated Problem of finding the 
conducted in recent in theSe area * has been 

and numerical methods especially amenabte^hielT lent expositlons of the theory 
such works include the reports of the Natm 1 u gh ' speed com Puters. Examples of 
[123], and White [516] On the other hand n ^ f Standards C 37 ^ Faddeev 
ment of systems of linear equations for use wi^dLk caMaTor “ ***' 

equations^ ° f a set of-ultaneous linear 

such sets of equations are prSed in 3. 3 Z 3 4 ^7 fM ^ 

methods to the calculation of the inverse of a matrbc is develop sT” ° f 
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3.2 


foundations of factor analysis 


3 2. Basic Concepts of Determinants and Matrices 

For a full and proper understanding of factor analysis it^the 

the student have a exposure to this subject. The 

text, it is presupposed that the reader na knowledge and to make the 

name and notation. For example, the expression 

ad - be 


is denoted by the symbol 


a b 
c d 


which is called a determinant of the second order . since it contains two rows and two 

2. Definition of a determinant of order 3.—The symbol 

a t b t Ci 
$2 b 2 C/ 

$3 b 2 C-. 

is called a determinant of the third order and stands for 

afo 2 c 2 - afb 2 c 2 + a 2 b 2 c, - a 2 b,c 3 + a 2 b l c 2 - a 3 b 2 c v 

The nine numbers a ,-,c 3 are called the events 
symb o, these elements He in three (he.nzontaI) ™ am, 

second column. The diagonal from the upper left-hand corner 
by: 


(3.1) 


det A = 


a ii $12 ' " ' 

$21 $22 ‘ ‘ ' ^2 n 


*n 1 


$n2 


.For detailed treatment of these subjects the nstder is referred to such excellent texts as 
Bodewig [43], Hadley [190], and Paige and Swift [385J. 


30 



MATRIX CONCEPTS ESSENTIAL TO FACTOR ANALYSIS 3.2 

where the n 2 elements are denoted by a’s with two subscripts, the first representing 
the number of the row and the second the number of the column in which the element 
appears. By definition the determinant A stands for the sum of «' terms each of 
which ,s (apart from sign) the product of n elements, one and only onffrom each 

mttTasX bTthTierh'Tf fr ° m 7 * T The algebraic Signs are d ^™n«< 

“below expanding the determinant which is explained in 

4. Minors and cofactors.— The determinant of order n - 1 obtained by striking 

° U . H h l r0W and cr0ssing at a « iven element of a determinant of order « is 

ed the motor of that element. Thus, corresponding to the element « in the /' h 

row and the k column of the determinant A, there exists the minor A/’ whicl/is 
obtained upon crossing out the given row and column. Frequently there is occasion to 
consider not this minor M Jt but the cofactor A Jt of a Jt defined by 

(12) 

Irt\uemlTiri S Tnd“ aChed ?u°2 ‘° ° b ‘ ain the corr «ponding cofactors 
SXSn* of a imrrn" * ^ fo,1 ° Wing ““ Whid > * “ 


+ - 


the^Zir Si 'L“ an '- Any detemi "“‘ ^ be “P-ded according 


to the elements of any row j: 

(3.3) 

or, in terms of any column k : 

(3.4) 


detA= £ ajt Aj 


deM = I 


(y= 1,2, 


(k =1, 2, • • •, n). 


Thus, for the third-order determinant 


till 0-12 $13 

a 21 $22 $23 
$31 $32 $33 


toa4)3hT a ^ 0rdinS ‘° the dements of the second column becomes, according 


(3.4) with k = 2: 
let A = a l2 A l2 

= ~ a 12^l2 


+ $22^2 
+ fli iM- 


+ ci-xsA- 


^2^12 -ra 22 M 2 2 - a 32 M 32 

a i2(«2i«33 ~ $3i$2s) + a 22 (a n a 23 - a 3l a l3 ) - a 32 (a n c 


a 22 $21$13)> 
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which, upon rearranging of terms, may be written as follows. 

det A = a^a 22 a 22 + a l2 a 23 a 2l + a l3 a 32 a 2l - a l3 a 22 a 2l - #23#32#n - #33#2i#i2- 

By successively applying the foregoing method, a determinant of any order 
eventually can be reduced to the explicit expansion of determinants of the second 

order 

6. Definition of a matrix.— A system of mn numbers a jk arranged in a rectangular 
array of m rows and n columns is called an m x n matrix. \fm = n, the array is called 
a square matrix of order n. A matrix will be represented by any of the following: 




a i i 

# 12 •• 

#1 n 

(3.5) 

II 

H 

< 

a 2l 

a 22 ' ‘ 

• a 2n 



#ml 

a m2 ‘ 

&mn_ 


7. Definition of a vector.— Two special instances of a matrix occur frequently, 
namely, a single row or a single column. Any such array of 1 x n or n x 1 elements 
is called a vector, or more specifically, a row vector or a column vector, respectively. 
Simple examples of a row vector are the notations (x, >’) and (x, y, z) for points m a 
plane and in space, the elements of these vectors being the coordinates of the points. 
Similarly, the coordinates of a point in space might be represented by a column 
vector in any one of the forms: 


X = 


x 2 


— x 2 x 3}> 


where the last expression may be used to conserve space. 

8. Notation for vectors, matrices, and determinants.— It should be noted that 
even when a matrix is square it is not a determinant. A determinant whose elements 
are real numbers, represents a real number, while a matrix does not have a value m 
the ordinary sense. The difference between a square matrix and a determinant is 
clearly seen upon interchanging the rows and columns; the determinant has the 
same value, but the matrix is generally different from the original one. 

Throughout this text, capital boldface letters are used to denote matrices other 
than row or column vectors, while lower case boldface letters are used for vectors. 
A determinant, when represented by a single letter, is printed m italic (usua y 
preceded by “det”). The determinant of a matrix A (see 10 below) is indicated by |A|. 

9. Transpose of a matrix.— A matrix which is derived from another by inter¬ 
changing the rows and columns is called the transpose of the original matrix. Thus, it 



a n 

a 12 

#13 

A - 

a 2l 

#22 

#23 


#31 

#32 

#33 
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then the transpose of A is the matrix 

#11 #21 #31 
A = a l2 #22 #32 • 


The prime notation for the transpose is followed throughout the text. The transpose 
of a row vector is a column vector, and vice versa. Thus, in the example of 7 above, 
tie transpose of the column vector x is the row vector x' = (x x x ) 

whoV D HT mi " an i! S 0f a matrix> AI though square matrices ‘and determinants are 
y different things, it is possible to form from the elements of a square matrix 
a determinant which is called the determinant of the matrix. The notation employed 
is bold-face type for the matrix and vertical lines for the determinant of the matrix 

°/ a Tf 6 matriX A iS den ° ted by |A| - ° ther determinants, 

rows and r 1 p f ° ^ any rectan § ular matrix by striking out certain 

ows and columns. For many problems it is important to know the order of the 
highest nonvanishing determinant of a matrix. , 

11. Rank of a matrix. -A matrix A is said to be of rank r if it contains at least 
one r-rowed determinant which is not zero, whereas all determinants of A of order 
lgher than r are zero. In other words, the rank of a matrix is the order df the largest 
non-vanishing determinant. ® 

By the rank of a determinant is meant the rank of its matrix 

' 2 'nT Ular m ? triX '~ i r A , square matrix is said t0 be *«*“*»■ if its determinant is 
zero. Otherwise, it is called nonsingular. 

13. Matrix equations.— Any two matrices A and B are said to be equal if and 
ony i every element of A is equal to the corresponding element of B. Thus, if 
A ( a jk ) and B — (b jk ) then the equation 

A = B 

implies that a jk = b jk for every j and k. Thus it is evident that a single matrix equation 

stands for as many algebraic equations as there are elements in either of the matrices 
which are equated. 

14. Symmetric matrix.- A matrix A is symmetric if and only if it is equal to its 

unTlter° S d t, ,1" n WOrdS ’ the matriX A = (a »> is s 7 mmetr i c in case it remains 
unaltered by the interchange of its rows and columns, i.e., 

f „ . . a jk = akj (./, k = 1 , 2 , • • •, n). 

i tie following is an example of a symmetric matrix: 


.78 

-.16 

.23- 

.04 

.16 

.59 

-.34 

-.21 

.23' 

-.34 

.86 

.40 

.04 

-.21 

.40 

.65. 


All correlation matrices are symmetric. 
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15. Gramian matrix. —A matrix of special interest to factor analysis is frequently 
referred to in psychological literature as a “Gramian” matrix, or a matrix with Gram¬ 
ian properties. These properties include symmetry and positive semidefiniteness. 
The symmetric characteristic of a matrix is defined in 14 above. A matrix is said to 
be positive semidefinite if all its principal minors are greater than or equal to zero. 
All correlation matrices with unities in the principal diagonal are Gramian matrices 
(see Theorem 4.5); and communality estimates as replacements for the diagonal values 
are considered “proper” only if the Gramian properties are preserved (see chap. 5). 

16. Sum or difference of matrices.— The sum (or difference) of two matrices each 
of m rows and n columns is defined to be an m x n matrix each of whose elements is 
the sum (or difference) of the corresponding elements of the given matrices. All the 
laws of ordinary algebra hold for the addition or subtraction of matrices. 

17. Multiplication of matrices. —The element in the y th row and the k x column 
of the product of a matrix A with n columns by a matrix B with n rows is the sum of 
the products of the successive elements of the j ih row of A by the corresponding 
elements of the k ih column of B. 

For example, if 






~b n 

i — 

<N 

-cf 

~ a l 1 

a l2 

«13" 

, B = 

bzi 

bz2 

_ a 21 

a 22 

«23_ 


_b$ i 

b^2_ 


then the product C of these matrices is 

r a u b u + a l 2 b 2l + a l 3 b 3l a^b^ + a 12 b 22 + a l 3 b 32 

^_A * B = 

[a 2 ibn + a 22 b 2l + a 23 b 3 i a 2 l b 12 + a 22 b 22 + a 23 b 32 _ . 

It should be noted that in this row-by-column multiplication of matrices the number 
of columns in the first matrix must be equal to the number of rows in the second. 
The product matrix then contains the number of rows of the first matrix and the 
number of columns of the second. Thus, in the example, the product of the 2 x 3 
matrix by the 3 x 2 matrix is a 2 x 2 matrix. This may be conveniently noted by 
writing the order of each matrix as superscripts, namely, 

A 2xM* 3 x2 = C 2X2 . 


In general, 

(3.6) A mx "-B" xs = C mxs , 

that is, the product of an m x n matrix by an n x s matrix is an m x s matrix. 

* In Thurstone [477, p. 10], these conditions are inadvertently ascribed to a positive definite 
matrix instead of a positive semidefinite matrix. Only if all the principal minors of a matrix are 
greater than zero (none equal to zero) is the matrix said to be positive definite. 
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Multiplication of matrices is not commutative in general, that is, 

(3.7) AB * BA. 

Thus, in the example above the product C = AB certainly is different from the 
product 



bi i a i 1 

+ b i 2 «21 

bi i«i 2 F b l2 a 22 

bi i a i3 

F b l2 a 23 

II 

II 

Q 

b 2 i a i i 

+ b 2 2 a 2l 

b 2 i a i 2 F b 22 a 22 

f>2l^i3 

+ b 22 a 23 


_p7> l a l 1 

+ b 32 a 2l 

^ 3 i a i 2 + b 32 a 22 

^31 a 13 

+ b 32 a 23 _ 


Hence it is important to specify in what order matrices are multiplied. In the product 
AB the matrix B is said to be premultiplied by the matrix A, or A is pastmultiplied 
by B. 

18. Alternative rules for matrix multiplication.—In the course of numerical com¬ 
putations it is sometimes more expeditious to multiply two matrices by some other 
rule than the conventional row-by-column. Following is a listing of all the permuta¬ 
tions for given matrices A and B: 

AB means row-by-column multiplication of A and B; 
p ^ ' AB' means row-by-row multiplication of A and B; 

J A'B means column-by-column multiplication of A and B; 

A'B' means column-by-row multiplication of A and B. 

19. Inner product of two vectors.—The “inner product” or “dot product” of two 
vectors is defined to be the sum of the products of pairs of corresponding numbers 
in the two vectors. Thus, if the two vectors are a = (a t a 2 a 3 ) and b = (b l b 2 b 3 ), 
their inner product is given by 

( 3 - 9 ) a • b = Ujbj -I- a 2 b 2 F ti 3 b 3 . 

20. Scalars.—In order to distinguish the ordinary quantities of algebra (i.e., real 
and complex numbers) from matrices, the former are called scalars and will here be 
designated in italics. The product of a matrix A by a scalar k (kA or A k) is defined to 
be the matrix each of whose elements is k times the corresponding element of A. All 
the laws of ordinary algebra hold for the multiplication of matrices by scalars. 

21. Diagonal and scalar matrices.—A matrix in which the diagonal elements do 
not all vanish and all remaining elements are zero is called a diagonal matrix. A 
special instance of such a matrix is one in which all the elements of the diagonal are 
identical; it is then called a scalar matrix. If a scalar matrix 


~k 0 ••• O' 

0 k ••• 0 


L 0 0 ••• k J 
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is premultiplied or postmultiplied by any matrix A of the same order as K, the follow¬ 
ing relationships become evident: 

(3.10) KA = AK = kA. 

In particular, the matrix 


(3.11) 


0 0 ••• 1 . 

is called the identity matrix, and it has the property that, if A is any matrix whatever, 

(3.12) IA = AI = A. 

It is evident that, in matrix algebra, all scalar matrices may be replaced by the 
corresponding scalars and, conversely, that all scalars may be considered as stand¬ 
ing for the corresponding scalar matrices. The identity matrix I corresponds to unity 
in ordinary algebra, and hence in products of matrices the factor I may be suppressed. 
22. Inverse matrix.—If a square matrix 



~ a n 

a l2 

a^n 


«21 

a 22 ' 

' a 2n 

A = 

a„i 

a n2 

@nn_ 


is nonsingular, i.e., |A| =4 0, then there exists another matrix 


(3.13) 


A 


- i 



An 

A-2 1 

A„i 


"a 11 

a 21 • 

• a” 1- 

II 

^12 

^22 ' 

■ ‘ A nl 

= 

a 12 

a 22 • 

• a nl 


-Ain 

A-2 n 

A n n- 


_a ln 

a 2n • 

■ a nn _ 


in which A kj denote the cofactors of the elements of A, and the matrix of these co¬ 
factors (with 1/|Aj factored out) is called the adjoint of matrix A. The matrix A -1 , 
with elements denoted by a jk , is called the inverse of A and is itself a nonsingular matrix 
which has the property 

(3.14) AA“ 1 = A" *A = I. 

It should be noted that the rows and columns of cofactors in the adjoint matrix are 
interchanged, i.e., the element A kj in the j lh row and k ih column of the adjoint of A 
is the cofactor of the element a kj in the /c th row and j th column of A. 
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23. Theorems on transpose and inverse of products of matrices.—The transpose 
of a product of matrices is equal to the product of their transposes taken in reverse 
order. Thus, 

(3.15) (ABC)' - C'B'A'. 

The inverse of a product of matrices is the product of their inverses taken in reverse 
order. For example, 

(3.16) (ABC) -1 = C -1 B -1 A -1 . 

3.3. Solution of Systems of Linear Equations : Method of Substitution 

In general, a system of n linear equations in n unknowns can be solved by means 
of determinants [385, p. 129]. While the determinantal method may have some 
undisputed theoretical advantages, a more economical procedure is desired, especially 
when dealing with a large number of variables. The systems of equations which 
appear in factor analysis, have symmetric matrices of coefficients and so lend them¬ 
selves to special methods of solution. Gauss’s method of substitution* produces a 
routine scheme for the solution of such a set of equations, including a complete check 
on the arithmetical work. 

The method of substitution will be outlined in general terms and illustrated with 
a simple problem involving least-squares prediction. For simplicity” suppose a 
dependent variable z,[ is to be predicted from three independent variables by means 
of the regression equation: 

(3.17) z 4 = p l z 1 + p 2 z 2 + /? 3 z 3 , 

where the /Ts are to be determined. The normal equations in this case are : 

r llPl + r l 2 P 2 + r 13^3 = r l4> 

(3.18) , r 2i p i + r 22 fi 2 + r 23 /? 3 = r 24 , 

r 3l/?l + r 3 2 P 2 + r 33&3 = r 34> 

where, of course, = 1 and the conditions for symmetry r jk = r kj are satisfied for 
j,k = 1,2,3. The system of equation (3.18) is to be solved for the three unknown p's 
in terms of the known correlations. 

In broad outline, the method involves the solution of the first of equations (3.18) 
for /?! in terms of ft 2 and /? 3 . This solution is then substituted in the last two of equa¬ 
tions (3.18). From the first of the resulting equations, /? 2 is solved in terms of /? 3 and 
substituted into the second equation. Thus, an explicit expression for /? 3 is obtained; 
and then working backwards, values of /? 2 and p x are found. The complete algebraic 

* This method is referred to as the “Doolittle Solution” in many textbooks on statistics. 
Convenient forms for the solution of a set of normal equations, arising in the problem of curve¬ 
fitting, were devised by M. H. Doolittle and presented in T. W. Wright and J. F. Hayford [534, 
pp. 101-24]. 
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expressions involved in this method of substitution are shown by Harman [196, 

pp. 60-62]. 

The substitutions can be accomplished routinely as indicated in Table 3.1 for the 
following numerical example: 

1.0000! + .69302 + -21603 = -571, 

(3.19) .6930! + l.OOO0 2 + .29503 = -691, 

.2160! + .29502 + 1-00003 = .456. 


Table 3.1 

Illustration of the Method of Substitution 


Line 

Independent Variables 

Dependent 

Variable 

z 4 

Total 

Instructions 

Zl 

Z 2 

Z 3 


Forward Solution 

1 


.693 

.216 

BB 

2.480 

From first eq. (3.19) 

2 

m 

-.693 

-.216 

BB 

-2.480 

Line 1 divided by negative of lead element 

3 

m 

1. 

.295 

.691 

1.986 

From second eq. (3.19) 

4 


-.480 

-.150 

-.396 

-1.026 

— .693 times line 1 

5 


.520 

.145 

.295 

.960 

Sum of lines 3 and 4 

6 

B 

-1. 

-.279 

-.567 

- 1.846 

Line 5 divided by negative of lead element 

7 

m 

■BB 

1. 

.456 

1.456 

From third eq. (3.19) 

8 



-.047 

-.123 

-.170 

— .216 times line 1 

9 




-.082 

-.122 

— .279 times line 5 

10 



.913 

.251 

1.164 

Sum of lines 7, 8, 9 

11 

B 

j^^B 

-1. 

-.275 

-1.275 

Line 10 divided by negative of lead element 


Back Solution 

12 

m 

^BB 

Pi 

.275 

_ 

From line 11, negative of entry in z 4 column 

13 



Pi 

.490 

— 

From line 6:.567 - .279(.275) 

14 

m 

HI 

Px 

.172 


From line 2: .571 - ,693(.490) - .216(.275) 


The coefficients of the unknowns and the constant terms are recorded in lines 1, 3, 
and 7, with the elements below the diagonal omitted. By dividing throughout by 
the negative of the lead element (1 in the example), the effect is to express 0 t in terms 
of 02 and 03 , i.e., 

(3.20) 0! = .571 - .69302 - .2160 3 , 

where the constant term must be reversed in sign because it was already on the right- 
hand side of the equation. The actual value of 0 t is not determined until the last step, 
in line 14. By proceeding with the instructions from line 3 to line 6 , the substitutions 
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3.4 


are accomplished to express f 2 in terms of /? 3 , namely, 

( 3 - 21 ) p 2 = -567 - ,279/? 3 , 

where, again, the derived value of the constant term must be reversed in sign Finally 
the substitutions leading to the explicit value 

( 3 - 22 ) = .275 

are made in lines 7 to 11. The back solution yields /? 3 immediately from line 11; f} 2 
by substituting the computed value of p 3 into the expression (3.21) for p 2 as given in 

line 6 ;/ and p x by substituting the values of p 2 and /? 3 into (3.20) as shown schematically 
m line 2. 

If sums are obtained for each row, and the operations listed under “Instructions” 
are applied to the “Total” column, a check on the arithmetical work can be obtained 
at each step of the process. However, certain adjustments must be made for the 
elimination of columns of figures in successive blocks. For example, in checking the 
total m line 4, the operation of multiplying the total in line 1 by -.693 must be 
adjusted for the elimination of 1.000 from the first column, to get 

-.693(2.480 - 1.000) - -1.026, 

which checks precisely with the sum. Similarly, a check of the total in line 8 is given 
by 

-.216(2.480 - 1.000 - .693) = -.170, 
and a check of the total in line 9 is given by 

— .279(.960 - .520) = -.123, 

where the last value differs from the sum of the entries in line 9 by only one unit in 
the last decimal place. 

3.4. Solution of Systems of Linear Equations: Square Root Method 

An alternative to the method of the preceding section will be developed now. The 
advantages of the square root method include greater compactness, requiring less 
recording, and permitting greater ease in finding the entries to be used. Not only is 
the square root method more expedient for solving a symmetric set of equations, but 
it is especially useful in obtaining the inverse matrix in solving problems in statistics. 

The square root method will be described in general terms and illustrated with the 
same numerical example employed in the last section. In Table 3.2 computing proce¬ 
dures are presented for the solution of a general system of equations (3.18), and 
illustrated with the numerical data of (3.19). The step-by-step procedure, immediately 
following, is readily extended to any number of variables. 

Step 1. Enter the intercorrelations among the independent variables and their 
correlations with the dependent variable on the first three lines of the 
worksheet. 
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Table 3.2 

The Square Root Method 



Independent Variables 

Dependent 

Variable 

Total 

Check 

Line 

Zl 

z 2 

z 3 


Schematic 

1 

r i i 

Fi 

^13 

Fa 

h 

r' i4 

2 

* 

r 2 2 

r 2 3 

r 2 a 

tl 

r 2 a 

3 

4c 

4c 

r 33 

Lw 

h 

Fa 

4 


*12 

*13 

*14 

*1 r 

*n 

5 


*22.1 

*23.1 

*24.1 

*2l. 1 

*21.1 

6 



*33.12 

*34.12 

*3r. 12 

*31.12 

7 

Px 

1 Pi 

Pi 


t?4.l23 

^4.12 3 


Illustration 

1 

1.000 

.693 

.216 

.571 

2.480 

.571 

2 

4c 

1.000 

.295 

.691 

2.679 

.691 

3 

4c 

4c 

1.000 

.456 

1.967 

.456 

4 

1.000 

.693 

.216 

.571 

2.480 

2.480 

5 


.721 

.202 

.410 

1.332 

1.333 

6 



.955 

.262 

1.217 

1.217 

7 

.171 

.492 

.274 


.563 

.750 


* Terms below the diagonal of a symmetric matrix are deleted for simplicity. Terms 
below the diagonal of the “square root” matrix are actually zero, and are simply omitted. 


Step 2. Obtain the sums by rows, i.e., 

(3.23) tj = £ r jk . O' =1^2,3) 

k= 1 

Note: entries in “Check” column for lines 1, 2, 3 are described in Step 10. 
Step 3. The actual square root process is begun by using r u as a pivot to get the 
first element in line 4 simply by 

(3.24!) N 1 = x /rIT, 

and the remaining elements by the formula: 

(3.24 2 ) s lk = r lk /s ll . {k> 1) 

Note: Since r n = 1, the elements of line 4 are equal, respectively, to the 
elements of line 1. 
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Step 4. The calculation in the “Total” column of line 4 is carried out as for any 
other column, yielding s u . This value should agree, except for rounding 
errors, with the sum s lt ( Check” column) of all elements computed in 
Step 3. 

Step 5. The formulas for the elements of line 5 are: 

25) S 22-l = V r 22 ~ S 2 i2 , 

S 2k-1 = ( r 2fc — s lk s i2)/ s 22-l- (k > 2) 

Step 6. Check line 5 by comparing the calculated value, s 2t . t , with the row sum, 

S 2 M- 

Step 7. The formulas for the elements of line 6 are: 


(3 26) S 33-(2) \/**33 ^13 S 23-H 

S 3k-(2) = ( r 3k ~ S lk S l3 ~ S 2k-l S 23-l)/ S 33-(2)-> (k > 3) 

where the notation s 3k . (2) is used instead of the specific s 3k . l2 to suggest 
an easy generalization when the number of variables already eliminated 
is more than 2. 

Step 8. Apply row sum check to line 6. 

Step 9. The values of the regression coefficients are obtained by application of 
the following formulas (back solution): 

P 3 = S 34-12/ S 33-12’ 

@27) , (k 2 = ($24-1 “ s 23-1^3)/ s 22-l’ 


Pi ~ ( S 14 S 13^3 ~ S 12^2)/ S l 1- 

Step 10. A check on the entire computations can be made by substituting the 
regression coefficients into the normal equations (3.18). The results are 
designated by r 14 , r' 24 , r 34 and should agree (except for rounding errors) 
with the original correlations of independent with dependent variables. 

Step 11. The multiple correlation coefficient can be computed by use of the usual 
formula involving the Ik's and f s, viz.. 


^4-12 3 — Pl r lA + /?2 r 24 + /?3 r 34- 


From the formal solution of the three-variable problem it can be verified that 


(3.29) 


~ r ll 

V \2 

r l3 


"su 

0 

0 


"Sll 

S 12 

S 13 

r 2 1 

r 22 

r 23 

= 

S 12 

S 22-l 

0 


0 

S 22-l 

S 23-l 

131 

^32 

r 3 3_ 


_ S 13 

S 23-l 

S 33-12_ 


0 

0 

S 33-12_ 


follows: 


(3-30) r = s'S, 

whence the term “square root of a matrix” is seen to correspond to the ordinary 
square root of an algebraic expression. The identity of equation (3.30) with the 
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fundamental theorem of factor analysis, equation (2.50), clearly indicates why factor 
analysts independently discovered the square root method, although it was referred 
to by various names (see 6.3). In other words, the square root method applied to a 
matrix R yields a matrix S such that premultiplication by its transpose (i.e., column- 
by-column multiplication of S by itself) reproduces the matrix R. It is convenient, 
at times, to refer to the “square root operation,” by which is meant (S') *, since 
(S') -1 operating (premultiplying) on R produces S. Then the square root operation 
can be applied to other matrices than the basic one from which it is derived. 

At times it is more convenient to arrange the work in adjacent vertical sections 
rather than in horizontal blocks. This is especially true if there are a large number of 
variables to which the square root operation is to be applied. 

While the square root method may be applied to any symmetric matrix, some 
difficulties will be encountered if the matrix is not positive definite, then certain of 
the diagonal elements may turn out to be zero or negative, and the process may 
degenerate or lead to imaginary numbers. When working with a correlation matrix 
(with unities in the diagonal), the square root method will proceed without any 
complications. However, when the diagonal values are replaced by communalities 
(less than or at most equal to one), then special considerations must be made to 
obtain the “real” portions of the solution (see 6.3). 


3.5. Calculation of the Inverse of a Matrix 

There are many situations in factor analysis where the inverse of a matrix is either 
required explicitly, or, if it were readily available, could lead to simplification in the 
work. One example is in regard to the estimation of communality, which is treated 
in chapter 5. A lower bound to the communality is the squared multiple correlation 
of a variable with the remaining variables, and the calculation of these multiple 
correlations is expedited by use of the inverse of the correlation matrix. Another 
example involves the calculation of a factor pattern from a factor structure for an 
oblique solution (see chaps. 11, 13, 15), in which the inverse of the matrix of factor 
correlations simplifies the task. The inverse of the matrix of factor correlations is 
also employed in the short method of estimating oblique factors (see 16.7). All of 
these examples point to the usefulness of an efficient means for determining the inverse 
of a given matrix. The solution to this problem can be accomplished by the methods 
of this section. 

While the inverse of a matrix is defined in 3.2, paragraph 22, such a mathematical 
statement does not provide a practical means for its calculation with numerical data. 
The methods for solving systems of linear equations described in the last two sections 
can be employed in getting the inverse of a matrix. The procedure can be demonstrated 
with the simple problem of deriving the inverse of the following matrix of correlations 


among three variables: 



1 

r n 

ri3 

R = 

r 2 i 

1 

r 23 


_ r 31 

r 32 

1 
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*21 

*22 *23 

= 

0 1 o 

r 3 i r 32 1 


_*31 

*32 *33_ 
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3.5 FOUNDATIONS OF FACTOR ANALYSIS 

The property (3.14), that the product of a matrix by its inverse is an identity matrix, 
may be put in the form: 


(3.31) 


where the elements of the inverse matrix R are denoted by * s. ine prooiem 
determine the elements of the inverse matrix from the known“Vl S' member 
Unon carrying out the matrix multiplication indicated in the left-hand member 
of p 31), and setting each resulting element equal to the corresponding element of 
the identity matrix on the right, the following equations are obtained. 


(3.320 


(3.32- 


(3.32 3 ) 


It will be noted that each set of three equations involves qic ^ ~ . 

dents the correlation matrix R. Hence, the work can be so organized that the solution 
for 111 the e’s can be made simultaneously by either of the methods described in 

3 ‘wLn1he inverse of a small matrix is required it can be obtained by the method of 
3 3 • however, when a larger matrix is involved, the square rpot method will be found 
more efficient. To illustrate the procedure, a matrix of correlations of six hypothetical 
variables will be inverted. This is accomplished in Table 3,3, where general instruc¬ 
tions are given and the work is outlined in schematic form as well a * tke “* ua 
calculation for the numerical example. The square ro °‘ ope T a t ‘^sr? P r o e s p < . ctivel v 
the correlation matrix R and to the identity matrix, yielding S and (S) respectively 
Then when the latter is premultiplied by S’ > the result is the inverse of thei origma, 
matrix R. The proof of the last result follows simply by taking the inverse 
sides of equation (3.30): 

(3.33) R^^S'HS') 1 - 


>•*11 

+ 7 * 1 . 2*21 

+ 7 * 13*31 

= 1 , 

< 7 * 21*11 

+ 1 • *21 

+ 7 * 23*3 1 

= 0 , 

7 * 3 i*i 1 

+ 7 * 32 * 21 

+ 1 * *31 

= 0 ; 

1 * *12 

+ 7 * 12*22 

+ 7 " 13*32 

= 0 , 

< 7 * 21*12 

+ 1 * *22 

+ 7 * 23*32 

= 1 , 

^ 7*3 1*1 2 

+ 7 * 32*22 

+ 1 * *32 

= 0 ; 

f 1 ‘ *13 

+ 7 " 12*23 

+ 7 * 13*33 

= 0 , 

< 7 * 21*13 

+ 1 ‘*23 

+ 7 * 23*33 

= 0 , 

^*3 1*1 3 

+ 7*3 2*23 

+ 1 • *33 

= 1 . 
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4 

Geometric Concepts Essential to Factor 
Analysis 


4.1. Introduction 

The understanding of factor analysis methods is enhanced by the use of geometry 
to supplement and extend the algebraic and matrix ideas. The geometric foundation 
developed in this chapter furnishes a basis for subsequent analysis and comparison 
of methods. Since the number of variables subjected to a factor analysis usually is 
quite numerous, and since the dimension of the geometric space will be found to be 
intimately related to this number, the geometry of concern will be “higher dimen¬ 
sional”. 

After a very brief exposition of the nature of higher dimensional geometry, a 
coordinate system is introduced, so that the succeeding development can be made 
analytically. Then, in 4.4, the notion of linear dependence is developed, which paves 
the way for one of the fundamental theorems of factor analysis. Before the applica¬ 
tion of these geometric ideas to the factor problem is made, certain necessqry formulas 
for distance and angle are developed in 4.5 and 4.7 for rectangular coordinates and, 
in 4.8 for general coordinates. The formulas in terms of general coordinates are 
included so that a geometric interpretation of oblique forms of factorial solutions 
may be made. The theory of orthogonal transformations, presented in 4.6, forms the 
basis upon which some actual analyses are obtained in later chapters. 

In 4.9 a variable is interpreted as a point, or a vector, in higher dimensional space. 
The standard deviation of the variable then becomes a distance, and the correlation 
between two variables is the cosine of the angle between the two vectors representing 
the variables. The direct application of the geometric theorems to the fundamental 
problems of factor analysis is made in the final section. There it is shown that the 
dimension of the smallest space which contains the vectors representing a given set 
of variables is equal to the rank of the matrix of correlations with communalities in 
the diagonals. 
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4.2. Geometry of N Dimensions 

The concept of higher dimensions is arrived at by geometric and algebraic means. 
The notions of point, line, and plane may be generalized to higher dimensional 
objects, and extended geometric interpretations of algebraic relationships may be 
given. The deductions made in the higher dimensional spaces are based upon the 
analogous theory in three-dimensional space. 

The basic axioms [437, chap. I] for Euclidean geometry may be assumed and such 
modifications made as are necessary to insure that the space has a sufficiently high 
dimensionality. The point, straight line, and plane are taken as undefined elements, 
and later corresponding elements of higher dimensional space may be defined in 
terms of these. 

Starting with four given non-coplanar points, all the points, lines, and planes can 
be obtained which constitute a three-dimensional space. The space, or manifold, 
determined by these points is essentially ordinary space of three dimensions. All 
that is necessary is to postulate that there is at least one point not in the three- 
dimensional space to generate a four-space. The three-dimensional region does not 
now constitute the whole of space but merely a subspace of the space of four dimen¬ 
sions. The three-dimensional region is called a hyperplane lying in the four-space, 
analogous to a plane lying in a three-space. A hyperplane in a space of four dimen¬ 
sions is determined by four non-coplanar points, a point and a plane, or by two skew 
lines. 

Some of the elementary geometric properties of the elements in a three- and 
four-dimensional projective* space may now be enumerated. In a three-dimensional 
space two planes intersect in a line; a line cuts a plane in a point; and any three 
planes have a point in common, while four planes do not in general have a point in 
common. In a four-dimensional space two hyperplanes intersect in a plane, three 
hyperplanes intersect in a line, and four in a point, while five do not in general have 
any point in common; a hyperplane cuts a plane in a line, and a line in a point; two 
planes have in general only one point in common, and a plane and a line in general 
have no point in common. 

The notion of dimensionality may be viewed in another manner. A point in a line 
is said to have one degree of freedom (of motion); in a plane, two; and in ordinary 
space, three. The point being taken as element, a line is said to be of one dimension; 
a plane, two ■ and ordinary space, three. These spaces are called linear spaces, or 
flat spaces, i.e., a plane is a two-flat and ordinary space is a three-flat. An (N - l)-flat 
in an N-space will be called a hyperplane. The linear spaces point, line, plane, three-flat, 
• • •, hyperplane, JV-flat are manifolds determined by one, two three, four, ■■■ ,N,N + 1 
points,f respectively, and having zero, one, two, three, • • •, N — 1, N dimensions. 


* In assuming a projective space, the discussion is simplified by avoiding the special cases of 

parallel elements. . / _ _ , ~ ~ 

t It is understood that the set of p points, which determine a {p - l)-fiat, do not lie m a (p - i)- 

flat. 
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4.3. Cartesian Coordinate System 

The geometric ideas are found to be most useful and easily formulated when they 
are given analytic representation. A point P may thus be represented by the vector 
(x t , x 2 , • • •, x N ). Each Xj is a real number, and all N numbers may be called a system 
or N-tuple. By a “point” is meant simply one of the undefined elements of the space 
which is characterized by a given set of axioms, so that a set of points is really an 
arbitrary set of any whatever elements. On the other hand, the N-tuple (x u x 2 , • • •, x N ) 
may be called an “arithmetic point.” A correspondence between a set of “geometric 
points” and a set of “arithmetic points” is called a coordinate system* The numbers, 
x r ,x 2 , • • -,x N , which constitute the representation of P, are called the coordinates 
of P. For purposes of factor analysis, the distinction between a “geometric” and the 
corresponding “arithmetic” point is not essential, and the word “point”'will be used 
for either one. The notation P: (x*) will frequently be used to designate the point and 
its coordinates. 

An iV-dimensional Euclidean space is assumed, and in this space a non-homo- 
geneous Cartesian coordinate system is set up. The points 0 :(0, 0, • • •, 0), and 
E l :(1,0, • • •, 0), E 2 :(0,1,0, • • •, 0), • • •, E N :(0, • • •, 0, 1) are called the origin and 
unit points, respectively. The N lines Ox t (i = 1, 2, • • •, N), each passing through 
the origin and one of the unit points, are the coordinate axes. The N hyperplanesf 

= Ox t x 2 • • •)**( • • • x N , each passing through O and containing N — 1 axes are the 
coordinate hyperplanes. A hyperplane n t is said to be “opposite” to the axis Ox,. The 
coordinates (x l5 x 2 , • • •, x N ) of any point P are equal, respectively, to its distances from 
each coordinate hyperplane measured along a line parallel to the opposite axis; or, in 
other words, the distance cut off on each axis by a hyperplane parallel to the respec¬ 
tive opposite coordinate hyperplane. For example, the coordinate x t is equal to the 
distance (denoted by x t ) cut off on the Ox t axis by a hyperplane parallel to the 
coordinate hyperplane n v 

4.4. Linear Dependence 

The iV-tuple (x 1? x 2 , ■ • •, x N ), which represents a point P, may be considered as a 
vector which joins the origin O to the point P. Such a vector is sometimes called a 
“radius vector.” Two fundamental operations in vector algebra are multiplication by 
a number and addition of vectors. More precisely, if P is a point represented by the 
vector (x x 2 , ■ ■ •, x N ) and c is any number, then according to 3.2, paragraph 20, 
cP is the point 

(cx^cx^ - ■ ■ ,cx N ). 


* The coordinate systems introduced in this volume always produce a one-to-one correspond¬ 
ence between the geometric and arithmetic points. This restriction may be removed, however. 
A correspondence may carry each point P into a set of arithmetic points, as, for example, in a 
homogeneous coordinate system. 

f The inverted parentheses are used in the designation of any hyperplane to indicate the 
omitted coordinate axis. 
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Also, according to 3.2, paragraph 16, ifP x :(x x t , x 12 , • • • ,x lN )andP 2 :(x 2 i, x 22 , • • • ,x 2N ) 
are two points,* then P t + P 2 is the point 

(*11 + X 2l ,X l2 + X 22 , ■ ■ ■ , X rN + X 2N ). 

In general, any linear combination of m points, Pi'.{x ll ,x l2 , -• •, x lN ), -• •, P m :(x ml , 
x m2 , • • •, x mN ), may be defined by combining the two previous operations, as follows: 

t ii°i + t 2 ^2 + ••• + t m Pm , 

where the f s are any numbers. By taking varying values of the f s, different linear 
combinations of the original m points can be obtained. Any one of these new points 
may be denotedf P(t) or P(t x , t 2 , ■ • •, f m ), with coordinates given by 

m 

(4.1) x t = £ t q x qi (/ = 1,2, • • •, N), 

q= 1 

and is said to be linearly dependent on the original points P t , P 2 , • • •, P m . Each co¬ 
ordinate x { of a point P(t) is expressed as a linear combination of the corresponding 
coordinates x u , x 2h ■ ■ ■, x mi of the m points P Y , P 2 , • • •, P m . Perhaps the linear 
dependence of any new points on the m original points can be visualized better from 
the following expanded matrix equivalent of (4.1): 





~*11 

*12 

' ' *1 N 

(4.2) 

(*1 *2- 

• • X N ) — (*i *2 • 

*21 

’ • *m) 

*22 ' ' 

' ' *2 N 




_ *m 1 

*m2 

' ' X mN 


To clarify the foregoing ideas, consider the special case of N — 3 and two points 
:(x ll5 x l2 , x 13 ) and P 2 :(x 21 , x 22 , x 23 ). All the points P(t ) which are linearly depend¬ 
ent on the points P l and P 2 are given by the following coordinates: 

*1 = Mil + *2*21 

P(t 1, t 2 ): X 2 = t x X l2 + *2*22 

*3 = * 1*13 + * 2*23 

for any whatever values of t x and t 2 . For particular values of t x and t 2 , the first co¬ 
ordinate of P(t) is a linear combination of the first coordinates of P x and P 2 ; the second 
coordinate is the same linear combination of the second ones; and the third co- 

* The double subscript notation is used on the coordinates in order to distinguish the points. 
Thus x qi designates the fth coordinate of the point P q . 

fThe symbol Pit), or P(t u t 2 , ■ ■ •, t m ), is the conventional function notation which is to be 
read, “P is function of t (in this case, a set of t’s), or P is a function of t u t 2 , - • , t m ." On the 
other hand, P:(x t ) is the notation for a point P with coordinates x t . 
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ordinate is again the same linear combination of the third ones. For example, if the 
coordinates of P l are (1, 3,4) and those of P 2 are (2,1, 5) and t x = 1, t 2 — 2, then 
P(t) is given by the coordinates x t = 5, x 2 = 5, x 3 = 14. 

The preceding description of linear dependence can be made, alternatively, by 
giving a direct definition of linear independence. Thus, a set of points P l5 • • •, P m is 
linearly independent if the N conditions 

m 

(4.3) IV^ = 0 (/= l,2,---,iV) 

q= 1 

imply that t t = t 2 = • • • = t m = 0. This can readily be shown to be consistent with 
the definition (4.1). For, if one of the coefficients were different from zero, say # 0, 
then (4.3) could be written in the form 

_ t 2 £3 _ _ hiv • 

*1 i — ~ X 2i ~ X 3>i ~~ . X mi t 

1 1 li Cl 

and, according to (4.1), the point P t would be one of the points P(t) which is linearly 
dependent on the points P 2 , P 3 , • • •, P m . Having a positive definition of independence, 
the definition of linear dependence is given by its negation, that is, a set of m points 
is linearly dependent if the conditions (4.3) hold for the coefficients not all zero. 

When a set of points is given, it may be of interest to know how many of them are 
linearly independent. Let Pi -(*ii 5 *i 2 >'''» *iv)’ P 2 : (*21> *22> ' ' ‘ *2 n)> ' ’ ’ > Pn’( X nl’ 
x n 2 i • • *, x nN ) be any set of n points. Either all these points coincide with the origin 
or at least one of them, say P u is independent. Of the remaining points, either they 
will all depend upon P t or at least one of them, say P 2 , will be independent of P t . 
Proceeding in this way an independent set of points, say P l5 P 2 , • • •, P m , will be ob¬ 
tained upon which all the points P l9 P 2 , • • •, P n will be linearly dependent. A criterion 
for determing m may be obtained by means of the matrix 




"Xn 

*12 

*13 * 

• X lN 



*21 

*22 

*23 

‘ *2N 

X = 

= (Xji) = 

*31 

*32 

*33 

' *3 N 



_*nl 

*n2 

*n3 * 

‘ X nN _ 


whose rows are the n points in N space. An important result for linear dependence 
of points, and which will be utilized later to determine the number of common 
factors necessary to describe a set of variables, may be stated as: 

Theorem 4.1. If m is the rank of the matrix X, the points P 1 ,P 2 , - ,P„ are all 
dependent upon m of them, which are themselves independent. 

The proof of this theorem may be split into two parts. First consider the case where 
n ^ N. By hypothesis the matrix X is of rank m, so that without loss of generality it 
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may be assumed that the determinant 


*11 *12 ■'' *1 

*21 *22 *2 


is different from zero. If m = n the set of equations 

n 

I Xkjtk = 0 C/= 1,2, •••,«) 

*= i 

have the unique solution = t 2 = • • • = t n = 0, since D # 0. Then, according to the 
definition (4.3), the points P u P 2 , • • •, P„ are linearly independent. If m < n the 
points P t , P 2 , - ,P m may be shown to be independent by the preceding argument. 
This establishes the last part of the theorem. 

Now to show that all n points are dependent upon these, form a new matrix by 
annexing a row and column to the matrix of D as follows: 


*ml *m2 ' ’ ‘ *mm *mi 
*pl X p 2 ••• X pm X p i 

where p = m + 1, - - •,« and i is arbitrary. The determinant of this matrix, when 
expanded according to the elements of the last column, becomes 

( 4 - 4 ) IAI = *i t D u + x 2i D 2i + • • • + x mi D mi + x pi D, 

where D u , D 2i , ■ ■ •, D mi are the cofactors of x u ,x 2i , - • • ,x mi , respectively, and D 
is the cofactor of the last element x pi . This expression vanishes, for, if i <; m, two 
columns then have equal elements; and, if i > m, it vanishes, since the rank of X 
is m and every (m + l)-order minor vanishes. The solution for x pi from the ex¬ 
pression (4.4) set equal to zero is 


(p = m + 1, • • •, n), 


where the constants 


do not depend on the elements x u ,x 2i , ■■■,x mi ,x pi . It follows from definition (4.1) 
that the points P p , whose coordinates are given in (4.5), are linearly dependent on 
the points P r , P 2 , ■ • •, P m , which are themselves independent. 
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While the situation would not arise in ordinary application of factor analysis, to 
complete the proof of the theorem consider the remaining case when n > N. In this 
case the points Pj: (x jl , x j2 , • • •, x jN , 0, • • •, 0) are in a space of n dimensions. Then 
the foregoing argument can be applied to obtain the relation (4.5), and thus the 
theorem is established for all values of n. 

The meaning of Theorem 4.1 may be demonstrated with the very simple example 
of the three points: 

1 3 4~ 

2 15. 

5 5 14_ 

This matrix is seen to be of rank two because the third-order determinant is zero 
while a second-order determinant can be found which is different from zero. Accord¬ 
ing to the theorem, since the matrix is of rank two, the three points are dependent 
upon two of them, which are themselves independent—a fact that was known about 
these particular points from the way the third point was constructed as the sum of 
the first point and twice the second. 

Subspaces of the iV-space may now be given analytical representation. If 
P r , P 2 ,- • •, P k are k linearly independent points, the set of all points linearly dependent 
on them is called a linear k-space and is defined by the equations 

(4.7) x t = £ tjXji (i = 1,2, • • •, N), 

j= i 

where the fs are a set of k parameters, and for each set of values (t t , t 2 , • ■ •, t k ) there 
is a corresponding point of the linear /c-space. Any one of the original k linearly 
independent points is, of course, given by definition (4.7); for example, P 2 is given 
by t 2 = 1 and = t 3 = • • • = t k = 0. The k points P x , P 2 , • • •, P k are said to determine 
the linear /c-space. A linear 1-space consists of all points whose coordinates are pro¬ 
portional to those of a given point P x : (xq t , x l2 , • • •, x lN ), and may be called a line 
through the origin. Its equations are given by 

(4.8) Xi = t t x u (i = i, 2, • • •, N). 

In an iV-space, these are a set of N parametric equations of a line through the origin, 
where is known as the parameter. 

In a plane, a linear 1-space consists of all points proportional to a given point, 
say Pi-(x ri ,x r2 ), and passing through the origin. Its parametric equations are: 


(4.9) 


Xi — t i-Xi i, 

X 2 = tiX 12 . 


Of course, this pair of equations reduces to the more elementary expression of a 
straight line through the origin: 


(4.9') 


y = bx 
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where y is the ordinate x 2 , x is the abscissa x l , and b is the slope x 12 /x n , derived 
from the given point P u 

The transitive law for linear dependence may now be indicated. All points linearly 
dependent on m points P Y , • • •, P m in a linear k- space are contained in that k- space. 
The coordinates of the m points are given by equations of the form (4.7), and any 
point linearly dependent on P u • • •, P m is then obviously dependent on P u ■ ■ •, P k . 

Furthermore, if the points P t , — , P k determine a linear k- space, there is no other 
linear k- space containing these points. A linear k- space is thus determined by any 
set of k independent points contained in it, and a linear /c-space does not contain 
a set of l independent points, where / > k. For, by definition (4.1), it is implied that 
any k points in a set of / independent points are themselves independent, and hence 
determine a linear /c-space contained in the larger set. Theorem 4.1 may then be 
stated as follows: 

Theorem 4.2. If m is the rank of the matrix X, the points P Y , P 2 , ■ ■ ■ , P„ are all 
contained in a linear m-space but not in a linear p-space, where p < m. 

A geometric interpretation of linear dependence can now be given. The m vectors 
P q : (jc,!, x q2 , ■ ■ •, x qN )„ (q = 1,2, • • •, m), employed in the definition (4.1), determine 
an m-dimensional subspace of the original N- space, and if OP Y , OP 2 , ■ • •, OP m are 
taken as the coordinate axes, then t u t 2 , • • •, t m in (4.1) are the coordinates x { of P(t). 

A set of m vectors is said to span an rc-space if every vector in this space can be 
expressed as a linear combination of the m given vectors. An important consideration 
is the smallest number of vectors which will span the space, and this turns out to be 
equal to the dimension of the space provided these vectors are linearly independent. 
Thus, any system of m linearly independent vectors spans an entire m-space and forms 
a basis for that space. An example of a basis is the set of unit vectors along the co¬ 
ordinate axes. A basis for a space certainly is not unique. As a matter of fact, there are 
an infinite number of bases for a given space [see 123, pp. 37-38]. This, in essence, is the 
indeterminacy of the factor problem. The choice of a particular basis, sometimes re¬ 
ferred to as the “rotation” problem of factor analysis, is the subject of part iii of this text. 

A linear /c-space, as defined by equations (4.7), always contains the origin, since 
the origin is linearly dependent on any set of points. The notion of subspaces of the 
iV-dimensional space may be generalized to spaces which do not include the origin. 
For this purpose, a translation of coordinates, 

(4.10) y t = x t + C;, 

is defined. Then any set of points which corresponds, under a translation, to a linear 
/c-space may be called a flat k-space, or merely a k-flat. As noted in 4.2, a 0-flat is a 
single point; a 1-flat is a straight line, a 2-flat is a plane; and an (N — l)-flat is a 
hyperplane. 

4.5. Distance Formulas in Rectangular Coordinates 

When the coordinate axes are mutually orthogonal, i.e., at right angles to one 
another, the reference system set up in 4.3 is called a rectangular Cartesian system. 
Some elementary formulas in rectangular coordinates are presented in this section. 
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For any two vectors or points P t : (x n , x 12 , • • •> x in) an( * ^2 • ( x 2 u * 22 ^ . ’ X2JV ^’ 
their scalar product (or inner or dot product) as defined in 3.2, paragraph 19 is given 

by: 

(4.11) Pi ' ^2 — X X li X 2i’ 

where summation with respect to i is understood. The norm of P, is defined as the 
positive square root of the inner product of P l with itself, that is, 

(412) N(P) = 1 = VI 

and the distance between and P 2 is defined by 


(413) D(P t P 2 ) = N(Pi — P 2 ) — n/I *2iT- 

It is readily seen that the norm of a point is the distance from the origin to the point, 
that is, N(P) = D(OP). The distance function satisfies the following familiar con¬ 
ditions of elementary geometry: 


(4.14) 


D(Pifi) = 0 , 

D(P^P 2 ) > 0 if Pi ^ P 2 -> 
D(P,P 2 ) = D(P 2 P { ), 

D(PtP 2 ) + D(P 2 P 2 ) ^ D(P t P 3 ). 


The first three of these relations are obvious. The fourth, however, requires some 
proof. It may be noted that distances are invariant under translation?. Thus if two 

points Pi P 2 are translated into two points P\,P' 2 , then D(PiP 2 ) - 
mav be verffled by applying (4.10). The fourth formula of (4.14) will therefore be 
ZtL h “pin ,”Pl and k are transformed by a carnes 

P 2 into the origin. Then, by (4.12) and (4.13), the inequality of (4.14) becomes 




(f = 1,2, ••■,#), 


(4.15) 

which may be verified algebraically. 

Now the equality occurs in (4.15) if, and only if, 

X 3 ; = 11X 1 j 

where u is a positive constant. These equations are of the form (4.8) and so represent 
a straight line through the origin with the points P t and P 3 on opposite sides o 
origin. 8 Hence, equality occurs in the fourth relations of (4.14) if, and only if, 
coordinates of P t , P 2 , and P 3 are related by equations of the form 


(4.16) 


A{x u - x 2i ) + B(x 3i - x 2i ) = 0, 


where A and B are constants of like sign and not both zero, if the S®**® * 416) 
i. cuti.fied and if P, * P, and P, # P„ then P 2 is said to he between P , and P 3 . 
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4.6. Orthogonal Transformations 

(417) I^u-^) 2 = Z(yu~y 2t f. 

From the condition that a point P 2 is between two others, P, and P 3 , if, and only if, 
°( p i p 2) + 0(P 2 P 3 ) = D(P 1 P 3 ), 

it follows that a transformation which leaves distances unaltered carries straight 
hnes into straight lines. Now, by a fundamental theorem of geometry * the trant 
formation is linear, that is, of the form y ’ 

N 

18 ^ y T = a * x jk + c i (i = 1, 2, • • •, N; j = 1, 2, 3, • • •). 

Upon substituting the values of y u and y 2i from (4.18), equation (4.17) becomes 

(4 ' 19 ^ .? (Xl, ‘ ~ X 2^ 2 = I E <*ik(x U - x 2ft )l . 

1-1 ;=iLfc=i 

u r fJ na ^ n ® to find the conditions which the a’s must satisfy in order that (4 19) 
ould hold and then the most general transformation which preserves distance 
will be specified. The right-hand side of (4.19) can be written as follows: 

N N N 

^ ^ ^ S zL • X 2^)(X 1J — X 2 j). 

i= 1 fc= 1 1= 1 

Hence, equation (4.19) is satisfied when 

(421) Zoc ik a n = 6 kl , 

f An ? C Kr S necker delta which is e£ l ual to unity if * = / and equal to zero 
ilk ^ l. Any linear homogeneous transformation, 

(422) J'y* = i «,x,„ 

k= 1 

murriv ““ tssati fy < 4 - 21 ) ^ called orrtogona/, and its matrix an orthogonal 
matrix. The following theorem has thus been established • 8 

The°re m 4 . 3 . The distance between my two points is an invariant under a general 
rigid motion, that is, an orthogonal transformation followed by a translation 

^^sssssssas^sgsai 
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in the A/-space be represented by rows of the following matrices 


•Vu yi2 ••• y in 
y 21 y22 ••• y 2 N 


and Y 


Then, if 


^21 ^22 '‘' & 2N 


&NI &N2 


the transformation (4.22) becomes: 

(4.23) Y = XT' 

The transformation matrix T is said to be orthogonal if and only if 

(4.24) T'T = I. 

In other words, the conditions (4.21) that the coefficients <x ik must satisfy in order 
that the transformation (4.22) be orthogonal imply an identity matrix. It also follows 
that a matrix T is an orthogonal matrix if it satisfies the condition (4.24). 

4.7. Angular Separation between Two Lines 

Other geometric ideas that are useful in factor analysis center around the notion 
of the angle between two lines. The only characteristic of a point is its position, as 
given by its coordinates in a frame of reference. A line is ordinarily distinguished 
not by coordinates but by its inclinations to the respective coordinate axes. The 
angles which a line OP makes with the axes, i.e!, 0,. = l_POx h are called the direction 
angles of the line, and their cosines are called direction cosines. If the norm N(P), 
i.e., the distance D(OP), is denoted by p, then the direction cosines are givibn by 

( 4 -25) k { = cos 0,. = xjp (i = 1, 2, • • •, N). 

By (4.12), 


and substituting the value of x,- from (4.25), gives 
( 4 -26) £ kf = £ cos 2 0,. = 1. 

This property, that the sum of the squares of the direction cosines of a line in A-space 
is equal to unity, is a direct extension of the one in ordinary space. 
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The parametric equations of a line through the origin 0 and a fixed point : (x u ) 
are given by (4.8). The coordinates of any point P : (x,) on a line through the origin 
with the direction cosines are given by (4.25). When p is taken as a parametric 
variable along the line, the N equations (4.25) can be regarded as the equations of 
the line, which may be written 


X; 

P = — 

Upon equating the N expressions for p, the following N 


(i= 1, 2, • • •, N). 


1 equations arise: 


(4.27) 


X t X2 _ Xjy 

A t k 2 '*N 


If P : (x t , x 2 , • • •, x N ) is taken as a variable point on the line, (4.27) can be regarded as 
the equations of the line. 

By means of a translation, of the form (4.10), the equations of a line AP through 
an arbitrarily fixed point A: (a u a 2 , ■ • •, %) and with the direction cosines a { are 
transformed from (4.27) to 


(4.28) 




Moreover, if 

(4.29) 2,- = bli (i = 1,2, • • •, N), 

where b is a constant different from zero, the equations of the line AP may be written 
in the form 


x i -a l x 2 - a 2 _'X N — a N 

(4.30) , — , • • • , > 

<1 <2 l N 

where the /,• are not now equal to, but only proportional to, the direction cosines. 
The numbers /,. are called direction numbers of the line. 

The actual direction cosines of a line can readily be obtained from the numbers 
proportional to them. For, squaring both sides of (4.29) and summing for /, this 
equation becomes 

where the last equality follows from (4.26). Then the constant of proportionality is 



and the direction cosines are given by 


(4.31) 
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Hence (4.30) may be taken as the general form of the (N - 1) equations of a line in 

The coordinates of any point P : (x t ) on a line through A : (a,) with direction numbers 
/,• are 

^432) X,- = a ; + tl { (i = 1, 2, • • •, N), 

where t is the common value in (4.30). Equations (4.32) may be regarded as a system 
of parametric equations of a line through a fixed point. The distance D{AP) along the 
line from the fixed point A to any position of the variable point P is 


D(AP) = VI (x, - a t f = 


so that, 
(4.33) 


D(AP) 

JTif 


It is thus evident that the parameter t in equations (4.32) is proportional to the 
distance from the fixed point to a variable point on the line and is equal to this 
distance when the equations of the line are given in terms of the direction cosines. 

Now a formula for the cosine of the angle between two lines in N -space may be 
derived. When two lines meet in a point,* a plane can be drawn through the point 



containing the two lines, and their inclination can be obtained from the trigonometric 
properties of a triangle in the plane. Let the two lines through A : (a,) be represented 
by the equations 


(4.34) 


x t — a t 

II 

<N 

1 

<N 

X 

II 

X N Un 


2 2 

2jv 

y i - «i 

II 

<N 

1 

<N 

1 

1 

>> 

II 

Pi 

Pz 

Pn 


* If the lines do not meet in a point, the angle between the lines may be defined as the angle 
which one of the lines makes with a line parallel to the second, which intersects the first line. 
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where the x { and y { are the coordinates of the variable points on the lines, and the 
a,- and p,- are the direction cosines of the lines. On the first line take any point P at 
a distance p from A ; on the second line take any point Q at a distance q from A ; 
and connect the points P and Q with a line, which necessarily lies in the plane. The 
points and lines are plotted in the plane of the two given lines in Figure 4.1. 

Let 4> = angle PAQ and let d = D(PQ), then the law of cosines applied to the 
triangle PAQ gives 

(4-35) d 2 = p 2 + q 2 — 2pq cos 4 >• 

The distance d is also given by formula (4.13), in which, according to (4.32), the 
coordinates of P are x,- = a { + p/ H and those of Q are y { = a { + qp t , so that 

d2 = Z (*; - ki) 2 = X (P*i ~ <7P;) 2 , 

(436 > =P 2 Y*f + <l 2 Zp 2 ~2pqZ^ i , 

= p 2 + q 2 - 2 pq £ kite, 

since £ Xf = £ pf = 1 by (4.26). When the terms of (4.36) are identified with the 
corresponding ones of (4.35), the following result is obtained: 

( 4 - 37 ) cos* = 2^. 

Thus the cosine of the angle of separation of two lines is given by the sum of the 
products of corresponding direction cosines of the lines, i.e., the scalar product of 
the vectors (X u X 2 , ■ • •, X N ) and (p u p 2 , ■ • •, p N ). 

By means of formula (4.37) another expression for the scalar product of two vectors 
or points may be written in place of (4.11). The coordinates of the two points P t : (x t •) 
and P 2 : (x 2i ) may be expressed as follows : 

*i< = Pi^ii, x 2i = p 2 X 2i (i = 1, 2, • • •, N), 

where p 1? p 2 are the respective distances from the origin to the points, and k v / r 

are the direction cosines of the lines OP, and 0P 2 . Then, substituting these values 
in (4.11), there arises 


Pi 1*2 — Y, x li x 2 i — P 1 P 2 Yj 2liX 2 i, 
which, according to (4.37), reduces to 

( 4 - 38 ) Pl-P 2 = P 1 P 2 cos 4>, 2 , 

where <f> l2 is the angle P,OP 2 . Formula (4.38) states that the scalar product of two 
vectors is the product of the lengths of the vectors by the cosine of their angular 
separation. This is very often taken as the definition of the scalar product. 

4 . 8 . Distance and Angle in General Cartesian Coordinates 

In sections 4.5 and 4.7 formulas for distance and angle are presented in terms of 
rectangular coordinates. Now the restriction that the coordinate axes are mutually 
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orthogonal will be removed, and more general formulas obtained. The formulas 
for distance and angle will then be in terms of 

will simplify to the preceding ones when the angles between all parrs of reterence 
^hcgcncral Cartesian coordinate system contains N reference axes Ox, which 

distance and ingle in terms of general Cartesian coordinates anil involve the mchna- 

"'Fonn^s'foTthf^stance function in general coordinates will first be given. In 
the plane the square of the length of the radius vector OP is readily found to be 

,P- = [Z)(OP)] 2 = x? + x\ - 2x,x 2 cos ( 180 " - 0 12 ) = x\ + x\ + 2x,x 2 cos 6 i2 . 

This formula follows immediately on applying the law of cosines to * he 
POM , indicated in Figure 4.2. By induction, it can be shown that in iV-space th 



distance p 


Fig. 4.2.—Distance from origin to a point in general coordinates 
from the origin 0 to an arbitrary point P: (x l5 x 2 , • • •, x n) is given by 


P — -y/^ ^ XfXfc COS Oft, 


where Y Y indicates summation for i and k from 1 to N. This convention for the 
double summation will be employed throughout this section. Similarly, the distance 

between any two points 

Pr-(Xn,X l2 , * * * , Xijv) and P 2 : ( X 2 l’ X 22 ’- • ■ , X 2n) 
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may be shown to be given by the following formula: 

(4-40) £>(T,P 2 ) = 

The relations (4.39) and (4.40) reduce to the corresponding formulas (4 12) and 14 n) 
for d,stance m rectangular coordinates, inasmueh as eos 6® = 0 for every’ ' 
There wtH now be given several properties of a line in terms of more genial (not 
necessanly rectangular) coordinates. The direction of a line OP is determined bvthe 
ratios of the coordinates of an arbitrary point P-lx x .. v ) „„ .t, r 7 T 

lenst h p = D(OP) from the origin to the point. These ratios’denoted by T^x-fp 
.n e the form °" ^^““ ° R Then if ' theCOOT <*inates of P are eipressld 


X i = P'H 


X k ~ P/vfc, 


and substituted in (4.39), there results 

(442) SS^cosd,. t = l. 

The direction ratios become the direction cosines of a line when a general Cartesian 
to (4 26) 6 SyStem ‘ S SpedaIized t0 a r “tangular one. Then formula (4.42) reduces 

be A d n ed e uced S Fn n f ° r “: e ang ! e b l‘ Ween tW0 ,ines ’ in 8®eral coordinates, can now 
■ u a u l s^phcity, let the two lines pass through the origin and be dis 
tinguished by the direction ratios a u (i — i ? \i\ , • , ^ s 

p -(x.) rm thi fieef r ^ uou / dnos w - 1, 2 , • • •, N), respectively. Select a point 

Q l D(00) d an fl P ° mt ? ' ^ ° n the second line > and let p = D(OP), 

q - U{OQ), d = D(PQ), and </> = angle POQ. Then 

d 2 = p 2 + q 2 — 2pq cos 4 >• 

But i is also given by (4.40), in which the coordinates of P are x, = pX- and those of 
Q re y, so that after these values are substituted, the formula becomes 

= pl + 4 2 ~ 2pqt £ £ X,p k cos e lk . 

By equating the two expressions for the square of the distance, the following formula 

for the angle between two lines is obtained: e louowmg formula 

(4 ' 43) c°s 4> = yy x,p t cos . 

This formula reduces to (4.37) when the axes make right angles with one another. 

4.9. Geometric Interpretation of Correlation 

In this and the following section there will be presented a number of applications 
of the preying geometric ideas to the factor problem. The raw data arelhe vTue 
of n variables for each of IV individuals. When the variables are standard form! 
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the matrix of such values: 



Zu 

*12 •• 

’ Z IN 

z = (Zjd = 

Z 21 

z 22 ‘ 

' Z 2N 


_ Z "1 

z n2 • 

' • z nN_ 


may be interpreted as containing (by rows) the rectangular Cartesian coordinates 
of n points Zj{j= 1,2,*--, n) in an N- space. Similar interpretation can be made of 
the elements in the matrix X = (x,;) of deviate values. 

The length of the radius vector to a point x jf according to formula (4.12), is 


(4.44) Pj = VX4- 

According to definition (2.4), however, this expression simplifies as follows: 


(4.45) Pj^y/NSj. 

The standard deviation of a variable may thus be interpreted as being proportional 
to the distance from the origin to the point representing the variable, the constant 

of proportionality being 1 A/n. 

By way of geometric representation of a set of values of two variables, Xj and x k , 
it is customary to think of the x ji and the x ki as the coordinates of N points (x^, x kl ), 
(x j 2 ,x k 2 ),---,(Xj N ,x kN ) in the plane x/)x k . This plot of points is called a scatter 
diagram ; and, by means of this representation, a better understanding of the relations 
involved in the definition (2.7) of a coefficient of correlation can be obtained. In 
general, for n variables this will be referred to as the point representation. 

Even more important in some respects is a geometric representation not by N 
points in a plane (for two variables) but by two points in an N- space. The two variables 
are then represented by the vectors Xj‘.(Xji, x j2 , " ‘, x jN ) and x k :(x kl , x k2 , *' ■, x kAr ). 
Such a configuration for n variables will be called the vector representation. 

The interpretation of the n rows of matrix Z as the coordinates of n points in an 
N-space is the vector representation of the variables. On the other hand, the same 
numbers in the matrix Z may be read in sets by columns to give N points in an 
n-space. In the latter case there will be a swarm of N points in the point representation 
of the n variables. While both concepts are employed in factor analysis, somewhat 
greater use is made of the notion of n vectors to represent the variables in a space 
corresponding to the N individuals. 

If the direction cosines of these vectors are denoted by A jt and X ki , respectively, 
then by (4.25), 


(4.46) 


X;; 


An = 


Pj 


Xfti 

Pk 


(i = 1,2,---,N), 


where Pj = D(Oxj) and p k = D(Ox k ). Inserting these values in formula (4.37), it 
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becomes 


£ x a x ki 


where (f> jk is the angle of separation of the two lines. Then cos <f> jk may be interpreted 
as the scalar product of the vectors Xj and x k divided by the product of the lengths 
of these vectors. Upon substituting the values for Pj and p k from (4.45), formula (4 47) 
reduces to: 

(448) cos ** = = 0* (j, k = l, 2, ■■■,„). 

The coefficient of correlation between two variables (measured as deviates from their 
respective means ) is the cosine of the angle between their vectors in N-space. 

While the raw data of a study are the values of the variables, whether they are in 
arbitrary units or standardized, such data are usually reduced to correlation co¬ 
efficients among the variables before a factor analysis is made. For this reason, it is 
important to put some of the preceding geometric notions in terms of the correlation 
matrix R instead of the matrix of standardized values Z. In particular, it is desirable 
to have an interpretation of Theorem 4.2 in terms of the correlation matrix. To this 
end, the following is required first: 

Theorem 4.4. The rank of the product of a matrix by its transpose is equal to the 
rank of the matrix. 

The proof follows immediately from another theorem [see 190, p. 138], which states 
that the rank of the product of any two matrices is less than or equal to the minimum 
of the rank of either one. Since the two matrices under consideration in Theorem 4.4 
are transposes of one another, and have the same rank, the rank of the product is 
equal to this common rank. 

The product of the matrix Z by its transpose Z' is equal to the correlation matrix R 
multiplied by the scalar N, viz., 

( 4 -49) ZZ' = NR, 

which is readily seen by carrying out the matrix multiplication and substituting the 
definition of the variances and correlations for standardized variables. Since the 
theorem is concerned only with the ranks of the matrices, the non-zero factor N is 
irrelevant to it. The right-hand side of (4.49) might be represented by R* to be 
absolutely rigorous, but for simplicity it will be written R. With this understanding, 
the theorem may be restated as follows: If m is the rank of matrix Z, then the rank 
^ ^ ^ equal to m. In other words, the rank of the correlation matrix is equal 

to the rank of the matrix of observed values. 

A more powerful relationship between these two matrices (and also between the 
reproduced correlation matrix and the factor matrix) is the following: 

Theorem 4.5. If Z is an n by N matrix of rank m, with real elements, then ZZ' = R 
is a positive semidefinite real symmetric matrix ( Gramian ) of rank m* 

* For proof see A. A. Albert [7, p. 65], 
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4.10. Subspaces Employed in Factor Analysis 

The geometric notions introduced in this chapter make it possible to determine 
the minimum number of common factors necessary to describe a set of variables in 
the sense of equations (2.22). According to Theorem 4.2, the n points whose co¬ 
ordinates are given in the matrix Z are all contained in a linear w-space, where m 
is the rank of the matrix. In other words, the n vectors can be described in terms of 
m reference vectors. Furthermore, since the rank of the correlation matrix R is 
equal to the rank of the matrix of standardized values Z (according to Theorem 4.4), 
any property of the variables which is inferred only from the rank of the latter matrix 
may be stated in terms of the correlation matrix. It therefore follows from theorem 4.2 
that the n variables can be expressed as linear functions of not less than m factors, 
where m is the rank of the correlation matrix. 

In case of the component analysis model (2.8), the correlation matrix contains ones 
in the diagonal and its rank usually is n. The variables would then be describable in 
terms of not less than n factors. If it is desired to describe the n variables in terms of 
fewer than n common factors, a pattern of the form (2.22) may be postulated. From 
such a pattern the correlations are reproduced, as before, but with communalities 
in place of ones in the diagonal. Then a factor pattern of the desired form can be 
obtained by employing a reduced correlation matrix, i.e., a correlation matrix with 
the ones replaced by communalities. The rank of this matrix is generally less than 
the order n. By the preceding argument, it is therefore apparent that the number of 
common factors in the pattern is equal to m, the rank of the reduced correlation 
matrix. This is the smallest number of factors that will account for the intercorrela¬ 
tions. Stated geometrically, the smallest space containing the n points is a flat w-space. 
Such a space will be referred to as the common-factor space. For purposes of reference 
the above ideas may be recapitulated in the following: 

Theorem 4.6. If m is the rank of the reduced correlation matrix then the smallest 
number of linearly independent factors which will account for the correlations is m, 
or, the common-factor space is of m dimensions. 

It should be emphasized that while the dimension of the common-factor space 
can be determined by any of several methods (see part ii), such solutions do not 
provide unique positions of the factors. Any m linearly independent factors form a 
basis of the common-factor space, and, as noted before, the selection of a particular 
basis is considered in part iii. 

In order to clarify the preceding ideas, the three important spaces will be reviewed. 
For any variable Zj the system (z jU z j2 ,---, z jN ) of N real numbers may be considered 
as the rectangular Cartesian coordinates of a point in an TV-dimensional space. By 
means of this vector representation, the configuration of two variables is merely 
two dimensional, i.e., in a plane, although it has to be regarded as imbedded in an 
AT-space. In general, the configuration of n vectors may be regarded as in an n- 
dimensional space which is imbedded in the original AT-space. For purposes of factor 
analysis, this space may be reduced further, to the w-space which will contain the 
n vectors, as indicated in Theorem 4.6. 
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Before giving the final interpretation of the vectors representing the variables in 
the common-factor space, the geometric meaning of the linear expressions (2.9) 
which include unique as well as common factors will be indicated. The n vectors 
may then be considered in the total-factor space of the wi common factors and 
n unique factors. The vector representation of any variable in this space is given by 

Zj • (^jl> a j2’ ■ ' ' > Qjm 5 ' ' ' > 0? dj, 0, • • •, 0), 

where the prime is employed to indicate the linear model (2.9) for the observed 
variable z } . The first m coordinates are with respect to the common-factor axes, and 
the last n coordinates, consisting of only one value different from zero, are with 
respect to the unique-factor axes. For simplicity let it be assumed that the common 
factors are mutually orthogonal, and, as usual, the unique factors are orthogonal to 
all factors. Then the norm, or length, of such a vector, according to (4.12), is 

(4.50) N{z'j) = y/a% + ■■■ + a,] m + dj = 1. 

In other words, each of the vectors representing the variables in the total factor space 
is of unit length. The direction cosines of such a vector in this space are simply the 
coordinates of the end point. The cosine of the angle of inclination ( 4>jk ) °f * wo such 
vectors, z) and z' k , then becomes 

m + n m 

(4.51) COS (f)jk — ^ Xj p X kp ~ X a jp a kp ~ r jk > 

p= 1 P= 1 

where X' jp and X' kp denote the sets of direction cosines of z) and z k , respectively. 
Equation (4.51) shows that the reproduced correlation for any two variables is the 
cosine of the angle between their vectors in the total-factor space. The reproduced 
correlation r' jk will approximate the observed correlation r jk to the extent that the 
mathematical models of the variables are adequate. 

Now the final interpretation of the variables as vectors in the common-factor space 
can be made. The orthogonal projections of the n vectors from the total-factor space 
into the common-factor space of m dimensions are defined to be the vectors repre¬ 
senting the variables in this subspace. Such a vector may be denoted by 

Zj ’■ (dj 1, Clj2, • • • , Uj m ). 

The coordinates of the end point of this vector are the same as the first m coordinates 
in the total-factor space. This property holds even if the common-factor axes are 
oblique, provided only that the unique-factor axes are orthogonal to the common- 
factor space. It will again be assumed for simplicity that the common factors are 
uncorrelated. 

A projected vector in the m-space is usually of smaller magnitude than the corre¬ 
sponding vector in the total-factor space, being of the same length only if the variable 
has no unique variance. Likewise, the angles between pairs of vectors in the common- 
factor space are smaller and, consequently, their cosines larger. The length of a 
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vector z] in this space is given by 

(4.52) N(zj) = J a n + a % + • • ' + a jm = 

that is the square root of the communality kf of the variable. The direction cosines 
of any two vectors z" and 4 in the common-factor space are given by 

(4.53) 


V' 

JP 


ajbj. 


2 


■kp 


(p = 1,2, • • • ,m). 


®kp/h-k 

Putting these values into (4.37), the cosine of the angle of inclination of these vectors 
becomes 


(4.54) 


cos cf)'j k = X — X a jp a k P I hjh k - 

* p= i ' 


It is obvious that this expression is generally larger than that given by (4.51), being 
enual to it only when ft.fi* = 1. Hence the angles between vectors in the common- 
factor space are smaller than the corresponding angles in the total-factor space. 

The problem of interpreting a reproduced correlation i> geometrically ca 
treated in the common-factor space. It is evident from (4.51) and (4.54) that 

(4.55) ft = cos ft = ft/Vv 

The cosine of the angle of separation of two vectors representing variables in the 
common-factor space may be referred to as the correlation corrected for uniqueness. 
In other words, the expression (4.55) would be the value of the reproduced correla¬ 
tion between j and k if these variables were free from unique variance. Solving (4.55) 
explicitly for the reproduced correlation, there results 

(4.56) r 'jk = hjh k cos = hjh k r jk . 

Thus, by formula (4.38), the reproduced correlation between two variables is given 
by the scalar product of their vectors in the common-factor space. Of course, the 
observed correlation r lt differs from the value given in (4.56), unless the residual is 

^Many concepts useful in factor analysis have been developed and given geometric 
interpretation in this chapter. In order to summarize these ideas, and to make them 
readily available for future reference, they are recapitulated m Table 4.1 

A simple illustration of the foregoing ideas is given for the case of only two factors. 
The common-factor space is of two dimensions, and the two (uncorrelated) factors 
F, and F, are represented in Figure 4.3 by unit vectors separated by a right angle. 
Each variable Zj of a set can be described in terms of the two common factors and a 
unique factor. The linear expressions for two such variables may be written as follow . 

Z\ = Uil-Fl + a 12^2 + i + 0- U 2 , 

z ' 2 = «21^1 + a 22 F 2 + 0 -U ! + d 2 U 2 - 

The geometric representation of these linear expressions for the original variables can 
be made in the total-factor space of four dimensions, defined by the two common 
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4.10 FOUNDATIONS OF FACTOR ANALYSIS 


Table 4.1 

Concepts Stemming from Vector Representation of Variables 


(N number of individuals; n = number of variables; m = number of common factors) 



Sample Space 




Concept 

Variables in 
Deviate Form 

Variables 
in Stand¬ 
ard Form 

Variabh 

Space 

i Total-Factor 
Space 

Common-Factor 

Space 

Number of dimensions 

N 

N 

n 

n + m 

m 

Variables 

U = f 2, • • •, n) 

x j 

Z J 

z j ~ a jiF\ + 

‘ + a jmT m 

+ d jU, 

z j = a ji^i + 

' • • + a jm F m 

Coordinates 
(/ = U - ■ ■, N; 
p — 1, • • •, m) 

Xjt 

z Ji 

a j P , dj 

a j P 

Length of radius vector 

pj ~ \/NSj 

Pj = Vn 

Pj = N(z'j) = 1 

Pj = Nifj) 

= hj 

Direction cosines of 
vector corresponding 
to variable j 

2 x n 

1 A ji = — 

pj 

II 

*4 U 

4 = a jp ’ 

4 = d J 

X k = 0 for k ^ j 
(k = 1, • • • , ri) 

Ojp 

a jp - ~r 

h j 

Angle between variables 
j and k 

®jk 

4 

Ojk 

d 'jk 

Cosine of angle 

cos e jk 

cos e Jk 


cos d’ jk 

cos 0j k 


= X44 

— X 44 

m + n 

= X 44 

p= i 

~ X /i jp4p 

p= 1 


ii 

~M 

2* 

i 

~M 

Tn 

— X a jp a kp 

p=l 

= V 'jk 

H a jp a kp 

p~ 1 


Nsjs k 

= Lk 

N 

= Lk 

hjh k 

= 4 

Correlation between 
variables j and k 

r jk 

r ik 

r )k = X a jp a k P 

p- 1 J 

r 'jk = hjhkP'jk 


factors and the two unique factors. In this space the 
are of unit length, and their correlation is given by 


vectors representing zj and 


4 


'12 — ^ 11^21 + Cli2 a 22- 

Such essential information about the variables as the correlations can be obtained 
from the consideration of the common-factor space. The projections of the two 
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vectors z\ and z' 2 into this space are indicated in Figure 4.3 by z\ and z 2 , respectively, 
and may be written analytically in the form 

z\ = + a 12 F 2 , 

Z 2 = + @ 22 ^ 2 ' 

The lengths of these vectors are given by the square roots of their communalities, i.e., 

D{Oz'[) = Ja \! + a\ 2 = 

D(Oz 2 ) = s/a 2 1 + a\ 2 — y/h 

The cosine of the angle (</>'{ 2 ) separating these vectors is given by 

. Ui i ®21 @12 ^22 1 , . 

COS 4,2=—.— + —.— = 


or 

r ’12 = M 2 cos 0i 2 . 

This formula shows that the reproduced correlation of two variables is given by 
the product of the lengths of the two vectors by the cosine of their angle of separation 
in the common-factor space. 

In the foregoing discussion it was necessary to employ distinct notation to clearly 
represent elements in the different spaces. Since it would be rather clumsy to retain 
primes and double primes in the remainder of this volume, they will be dropped 
when no confusion can arise as to the particular space involved. 
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The Problem of Communality 


5 . 1 . Introduction 

In the preceding chapter it was shown that the dimension of the common-factor 
space is equal to the rank of the reduced correlation matrix. For a given set of 
observed correlations the rank of this matrix is effected by the particular values put 
in the principal diagonal, and it will also be recalled from 2.8 that the portions 
of the variances to be factored are determined by these diagonal elements. When 
unities are employed, i.e., assuming the component analysis model (2.8), the resulting 
descriptions of the n variables are in terms of n (rarely fewer) common factors. With 
the classical factor analysis model (2.9), the communalities are the basic quantities 
to be analyzed. Herein lies the trouble—there is no a priori knowledge of the values 
of the communalities. Either the rank of the correlation matrix or its diagonal values 
must be known, or approximated, in order to obtain a factor solution. According 
to Theorem 4.6, if the rank of the reduced correlation matrix is known (or can be 
assumed) to be m, then the common-factor space is of m-dimensions. Several pro¬ 
cedures for getting factor solutions make suitable approximations of the number 
of common factors (with more or less refined statistical tests of the adequacy of the 
assumed m). Other procedures require some estimates of the communalities, i.e., the 
diagonal values of the correlation matrix rather than its rank. The latter approach 
is employed in the principal-factor method and the centroid method, treated in 
chapter 8, as preliminary multiple-factor solutions (which may subsequently be 
transformed to more desirable solutions by the methods of part iii). A problem that 
has been plaguing factor analysts since the beginning of the multiple factor approaches 
is the question of how to determine suitable approximations to “communality”. 

The general problem of communality is approached through the following con¬ 
siderations: (1) the conditions that the correlation coefficients must satisfy in order 
for their matrix to have a given rank; (2) the determination of communality under 
the assumption of the rank of the correlation matrix; (3) the theoretical solution for 
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communality; (4) the approximations to communality without prior knowledge of the 
rank; (5) a simple example in which a factor solution is obtained directly, once the 
communalities of the variables have been determined. The first of these iterps is covered 
in Sections 5.2 and 5.3, while the determination of communality using assumed rank 
is developed and illustrated in 5.4. The theoretical considerations are presented in 
5.5, and followed by practical methods in 5.6 and 5.7, with examples of these methods 
in 5.8. Finally, in 5.9, a hypothetical example is employed to bring together the 
notions of communality and the factor model developed in the preceding chapters. 

5.2. Determination of the Common-Factor Space 

Theorem 4.6 states that the common-factor space is of m dimensions, where m is 
the rank of the reduced correlation matrix: 



One of the major problems in factor analysis is to determine how much the rank of 
this matrix can be reduced from n by a suitable choice of the communalities for a 
given set of observed correlations. The number of linearly independent conditions 
that the unknown communalities hj must satisfy in order that the matrix R shall 
be of rank m can be determined by means of the following theorem (for proof see 
[101, p. 79]). 

Theorem 5.1. The rank of the symmetric matrix R is m if an m-rowed principal 
minor R mm is not zero and if zero is the value of every principal minor obtained by 
annexing to R mm one row and the same column o/R, and also of every principal minor 
obtained by annexing two rows and the same two columns. 

The number of conditions imposed on the communalities can be found by formal 
application of the procedure set forth in this theorem. There are n rows in R and m 
in the nonvanishing principal minor R mm . This leaves (n — m) rows which may be 
annexed, one at a time, to R mm , or (n — m) determinants which must vanish. The 

in — m\ 

(n — m) rows may be added two at a time in I I ways to R mm , giving (n — m) 

(n — m — l)/2 additional determinants which must vanish. Hence the total number 
of independent conditions (i.e., the number of minors set equal to zero) that the 
communalities must satisfy in order that R be of rank m is* 

(5.1) v m = (n — m) + (n — m)(n — m — l)/2 = (n — m)(n — m + l)/2. 

* The proof given here is incomplete in the sense that the v m equations have not been shown 
to be linearly independent, i.e., that none of them follows from the others. Walter Lederman [334] 
arrives at the same number of conditions for the n unknown communalities, although by a different 
argument, and offers a proof of the linear independence of these equations. 
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In general, the v m equations have solutions h\, h\, ■ ■ ■, hi only if the number of 
unknowns is greater than or equal to this number of conditions. If the number of 
unknowns is less than the number of equations, then the coefficients in the equations 
(the correlations) must satisfy certain relations in order that the number of inde¬ 
pendent conditions for the unknowns may be reduced to the number of unknowns, 
i.e., in order that the equations be consistent. 

First, however, let it be assumed that the correlations are arbitrary values, in the 
sense that no constraints exist among them. Then the set of conditions can be satisfied 
only if they are no greater in number than the unknowns, viz., 

(5.2) n ^ v M . 

The last inequality may be written in the following equivalent form: 

(5.3) </>(m) — v m — n — [m 2 — (2 n + 1 )m + n(n — l)]/2 5S 0. 

Setting the quadratic equal to zero and solving for m, the two roots are given by 

(5.4) m = [(2 n + 1) ± ,/8 n + l]/2. 

It can readily be shown that the plot of the quadratic function </>(m), for any fixed 
value of n, is a parabola which opens up vertically. A typical member of this family 
of parabolas is shown in Figure 5.1 for the case of n = 9. While the complete 
mathematical function is depicted in this figure, only the left part of the curve up to 
the point where m = n has any interpretive value for factor analysis. In general the 
curve crosses the m-axis at the two points whose abscissas are given in (5.4), and 
hence cf)(m ) 5S 0 for values of m between these extremes. The rank of the correlation 
matrix, with unknown communalities and arbitrary correlations, may thus be 
reduced to the value m, which is given by 

(5.5) [(2 n + 1) + v/8 n + l]/2 ^ m ^ [(2 n + 1) - ^8 n + l]/2. 

The smallest possible value for m is then the smallest integer greater than or equal 
to the value in the right-hand member of (5.5). In Table 5.1 there is listed the smallest 
rank that can be attained for a matrix of a given order, up to n = 15, when the 
correlations are assumed to be quite arbitrary. 

Table 5.1 


Minimum Rank under Assumption of Independent Correlations 


n 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

m 

1 

1 

2 

3 

3 

4 

5 

6 

6 

7 

8 

9 

10 

10 


Generally, the observed correlations from statistical variables cannot be considered 
as arbitrary or independent. The inequality in (5.2) may then be reversed, that is, the 
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Fig. 5.1.—Function 0(m) for n — 9 


number of unknowns may be less than the number of conditions which they must 
satisfy. The unknown communalities are then “overdetermined” in the sense that 
the larger number of equations may not be consistent. In order for a solution to exist, 
the coefficients in the equations must satisfy at least relations. The differences 
= (v m — n ), that is, the number of conditions that the correlations must satisfy 
so that a matrix of order n can be reduced to rank m, are given in Table 5.2. 

The lower left-hand corner of the table has no entries because the rank cannot 
exceed the order of a matrix. A negative value represents a larger number of unknowns 
than conditions, so that there is an infinite number of solutions in such a case, the 
general solution involving (n — v m ) arbitrary parameters. A zero value represents 
the case of as many unknowns as equations. For a given number of variables, n, a 
negative or zero entry indicates the rank m which the correlation matrix can attain 
without any restrictions on the correlations. In these cases the inequality (5.2) is 
satisfied, and the conditions on the communalities are met under the assumption 
that the correlations are independent variables. The value of m for the first negative 
or zero entry, reading down a column of Table 5.2, corresponds to the value of m, 
for the same n, in Table 5.1. 
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Table 5.2 


Number of Linearly Independent Conditions on the Correlations: 4>(m) 


\ n 
m \ 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

n 

1 

-1 

0 

2 

5 

9 

14 

20 

27 

35 

44 

54 

In — 1\ 

\ 2 ) 

t n{n - 3) 

2 

2 

-2 

-2 

-1 

1 

4 

8 

13 

19 

26 

34 

43 

In - 2\ 

1 2 ) 

n(n — 5) 

- 2 “ 2 +' 

3 


-3 

-3 

-2 

0 

3 

7 

12 

18 

25 

33 

n 

-3 = " ( "- 7, + 3 

2 

4 



-4 

-4 

-3 

-1 

2 

6 

11 

17 

24 

f;l 

2 

5 




-5 

-5 

-4 

-2 

1 

5 

10 

16 

rn 

n(n — 11) 

5 = ' +10 

2 

6 





-6 

-6 

-5 

-3 

0 

4 

9 

n 

n(n — 13) 

6 = +15 

2 

m 












In — m\ 
1 2 ) 

~ <N 

+ 

1 

! 

1 ! 

1 


Of course, there would be little gained in parsimony of thought if the correlations 
in a study were indeed independent and required as many factors as indicated in 
Table 5.1. In a factor study the investigator usually selects the variables on some 
hypothesis of an underlying order, and the correlations cannot be expected to be 
independent. The relationships that exist among them lead to a rank of the correlation 
matrix lower than its order, if appropriate values are inserted in the principal diagonal. 
In the next section specific mathematical conditions are indicated which guarantee 
that the common-factor space is of one and two dimensions. In subsequent sections 
approximate methods are presented for the determination of higher dimensional 
common-factor spaces. 

5.3. Conditions for Reduced Rank of Correlation Matrix 

The number of independent relationships that must exist among the correlations 
in order that the rank shall be lower than the minimum in (5.5) is given by the 
positive values in Table 5.2. Thus for n — 3, no relationships are necessary to attain 
rank one; that is, three variables can always be described in terms of one common 
factor. Four variables, however, cannot be described by just one factor unless their 
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intercorrelations satisfy two (independent) conditions. These well-known con¬ 
ditions, due to Spearman [440], are the vanishing of the tetrads, namely, 

JTl2 r 34 “ r l4 r 23 = 

^ vT 13^24 “ ^14^23 = fl¬ 

it may be well to indicate how the conditions (5.6) are arrived at when n = 4 and 
m = i, so that the method of generalization will be more evident. When there are 
just four variables, the reduced matrix of correlations is simply 


R 


h\ 


r 12 r i3 

,2 


r 2 \ h 2 2 r 23 r 24 


34 


r 32 hi 

r 42 r 43 h A 


This matrix will be of rank one if all second-order minors vanish. By selecting 
appropriate minors-involving one row and column intersecting in the desired 
communality, and the other row and column different from each other several 
linear equations result for the solution of each of the communahties. Thus, K 
is given by any one of the following three equations. 

h\ 'i3| „ |"i '^|_n I" 1 ^1 = 0, 


fl3 

hi 

= 0 , 

^•14 

II 

© 

hi r lA 

r 23 

r 2 i 

r 2 4 


1 ^31 ^34 

hi 

II 

ro 

ro 

** 

II 

r i2 r 

14/^24 = 

r \3 r \J r 3 4 - 


or 
(5.7) 

On eliminating h\, the two equations (5.6) arise. The three separate solutions (5.7) 
for hi are consistent if the conditions (5.6) are satisfied. 

Referring to Table 5.2 again, it is seen that five relationships must exist among the 
correlations of five variables if they are to be described in terms of only one common 
factor From their correlation matrix, with unknown communahties in the principal 
diagonal, several linear equations for the solution of each of the commonalities can 
be obtained by setting appropriate second-order minors equal to zero. The first 
communality, for example, is given by any one of the six equations. 

hi 


hi r 13 

= 0, 

o' 

II 

r 2 1 r 23 


r 2 1 r 2A \. 


15 


21 '25 


o, 


hi 


14 


31 ' 34 


0, 


hi 

r 3l 


TlS 

r 35 


= 0, 


hi r 15 

r 4 i *45 


= 0, 


or 


(5.8) hl-r l 2 r l 3 /r 23 


rJr 24 . = r l 2 r l 5 /r 25 = r i 3 r iJ r i 4 = ^isAss - r lA r l 5 /r A5 . 
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The five conditions that must be satisfied to assure a unique value of h\ may be put 
in the equivalent form: 1 y p 

r l3 r 24 ~ r 14 r 23 ~ 0, 
r 13 r 25 ~ r l5 r 23 = 0, 

^ < r l2^34 - 1-14^23 = 0, 

r 12 r 35 ~ 1*15^*23 = 0, 
r i3 r 45 ~ ^ 14^35 = 0 . 

A ny other conditions must be linearly dependent on the above equations. Thus if 
instead of obtaining the solutions for h\, those for h\ were obtained, the resulting 
five conditions could then be shown to be dependent on the foregoing relations 

(See CX« /1. 

In general, a set of n variables is contained in a one-dimensional common-factor 
space it n(n - 3)/2 relationships exist among their correlations, according to the 
last entry on the first line of Table 5.2. These conditions, whatever form they take 
are equivalent to the following set: 


- n — L yfl 

For any variable z e , a term of the form 

(5.11) t jk = r ^± ie,j,k = 1,2,---,n\ 

r jk \ e # j ^ k I 

is caUed a triad. When the matrix of correlations is of rank one then the communality 
of variable z e is given by any one of the triads (5.11). It is readily seen that there are 

( 2 ) triads in or (subtracting one from this number), n{n - 3)/2 equations 

of condition for one general factor among n variables. 

The number of conditions of the form (5.10) to determine whether a matrix is of 
rank one is considerably less than the number of tetrads. Every four variables give 
rise to three tetrads so that the total number of different tetrads for n variables is 

(5.12) _ n(n - 1 )(n - 2 ){n - 3) 

The difference between this number and the number of triad conditions is 
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To indicate the magnitude of this number, suppose n = 15. The total number of 
tetrads is 4,095, while the number of triad conditions (5.10) is only 90. In other 
words, the labor of computing the conditions (5.10) is only about one-fortieth of 
that of computing the tetrads, for fifteen variables. For a large number of variables 
the economy of labor becomes more pronounced. 

The necessary conditions for a matrix of correlations to attain rank two will next 
be considered. For five variables it is a well-known fact that one relationship must 
exist, namely, the following pentad criterion, first obtained by Kelley [305, p. 58]: 

^ 12 ^ 23 ^ 34 ^ 45^51 ~ r 12 ^ 23 ^ 35 ^ 41^54 ~ ^ 12 ^ 24 ^ 35 ^ 43^51 
(5.14) + 2^24^31^45^53 + ^12^25^34^41^53 ~ ^12^25^31^43^54 

~ r l3 r 24. r 35 r A-l r S2 + r l3 r 2S r 3A r 4.2 r Sl + r 14 r 23 r 31 r 45 r 52 
“ ^14^25^32^43^51 ~ r l5^23^31^42^54 + T x 5^24^32^41^53 = 0. 

This condition can be derived in a manner similar to the preceding. The correlation 
matrix for the five variables must have every third-order minor equal to zero if it 
is to be of rank two. By selecting appropriate minors, several linear equations for 
the solution of each of the communalities and hence the conditions for consistency 
can be obtained. Accordingly, for h\ the following two determinants are employed: 


^1 r 13 r l: 


21 '2 3 r 25\ 


r 41 '43 ^45 


hi r 14 r ls 


' 21 '24 ^25 


r 3i r 34 r 35 


The two solutions for h\ are: 

h\ = [r 2 i(ri 3 r 45 - r l5 r A3 ) - f* 4 l(^ 13^25 - r l5 r 23 )]/(r 23 r 45 - r 25 r 43 ) 
h \ = [^ 21 (^ 14^35 ~ r l5 r 3A ) - r 3l (r lA r 2S - r 15 r 24 )]/(r 24 r 35 - r 25 r 34 ). 
Equating these, the following single consistency condition results: 

15 15t f[ r 24 r 35 - r 25 r 3 4 ][ r 2l(^l3*45 ~ ^15^43) ~ ^4l( r 13 r 25 ~ ^15^23)] 


t [^23^45 - ^"25^43] [^*21(^14^35 ~ ^15^34) ~ ^3l(^14^25 ~ ^15^4)] = 0. 

If the correlations offive variables satisfy (5.15), or the equivalent condition (5.14), then 
five communalities can be determined to make the rank of the correlation matrix 
equal to two, i.e., the five variables can be described in terms of two common factors. 

According to Table 5.2, the correlations among six variables must satisfy four 
independent conditions in order for their correlation matrix to attain rank two 
It will be convenient to define the following: 

\ h i r ib r u \ 


\h\ r ab rj = r al r 


ab *ad •> 


r c 1 r cb r cd 


that is, a representation of a determinant by the elements of the principal diagonal. 
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In order that a 6 x 6 matrix be of rank two, every third-order minor must vanish. 
Five such determinants (each involving the first row and column but otherwise 
unique rows and columns) are set equal to zero, as follows: 

\hi r 23 r 45 1 = \h 2 , r 23 r 4( J = |fc? r 23 r 56 | = \h\ r 24 r 56 | = \h\ r 34 r 56 | = 0. 

The solutions for h\ from these five equations follow: 

r 2l\ r l3 r As\ ~ r 4ll r 13 r 25\ 

\ r 23 ^*4-51 

r 2ll r 13 r 4-el ~ r 4-ll r 13 r 2ei 
1*23 r 4 6 l 

^2lkl3^56l ~ ^5ll r 13 r 26l 
1*23 ^el 

r 2 l\ r lA r 56\ ~ ^5lkl4 r 26l 
\r 2 4 r 5e\ 

r 3 il r i4 r 56l ~ r 5l \r lA r 36 \ 
k 3 4^561 

where 

(5.18) \ r ab r cd\ = y y = Tah r<:d ~~ r ° d r<:b ' 

r cb r cd 

Eliminating h\ from the five equations (5.17), the four conditions which the 
correlations must satisfy are obtained. The equality of the right-hand members of 
equations (5.17) are the necessary conditions for six variables to be describable in 
terms of two common factors. 

This process can be continued for any number of variables. The positive entries 
for m = 2 in Table 5.2 give the number of conditions that must exist among the 
correlations for a set of n variables to be describable in terms of only two common 
factors. In general, the correlations among n variables must satisfy ( n 2 - 5 n + 2)/2 
conditions in order that their matrix shall be of rank two. These conditions may be 
obtained by eliminating any communality h 2 e from ( n 2 -5n + 4)/2 equations of the 
form 

r ae\r eb r cd \ ~ r ce \r eh r ad \ l e, a, b, c, d = 1, 2, • • •, n\ 

(5-19) e \r ab rj \ e^a^b^c^d. ) 

It will be noted that many more conditions than the number indicated in Table 5.2 
can be written for n variables by means of the foregoing procedure. Corresponding 
to any four indices in the denominator of (5.19) -there is a third-order determinant 
of the form (5.16) which is to be set equal to zero for the calculation of a particular 
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communality h 2 e . The total number of possible denominators, and hence, third-order 
determinants, for the calculation of the communalities would seem to be enormous. 
Fortunately, however, this number is considerably reduced owing to the symmetry 
of the correlation matrix and certain properties of second-order determinants. 

Thus, of the 24 possible evaluations of a particular communality from the permu¬ 
tations of the four indices selected for the denominator in (5.19), only two determina¬ 
tions need be considered. The total number of third-order determinants which must 
vanish, for the rank of the n x n correlation matrix to be two, becomes: 

"“'!=(«- l)!/12(n - 5)! 

The number of conditions which arise upon equating the evaluations of a com¬ 
munality from these determinants is in excess of that indicated in Tqble 5.2 for 
n > 5. Although all these values of a particular communality must necessarily be 
equal (statistically) if the rank is two, these conditions are not all independent. The 
use of the large set of conditions furnishes a check on the rank. In practice, however, 
it will be found sufficiently accurate to equate a smaller number of evaluations of 
communality for each variable of a set, as illustrated in the following section. 

The procedure of this section can be generalized to obtain the necessary conditions 
for any correlation matrix R to be of rank m. In such case every (m + l)-order minor 
must vanish. When m is greater than two the work of computing determinants of 
the fourth or higher order becomes so laborious that no explicit conditions corre¬ 
sponding to the tetrads or pentad criterion have been worked out. While it is of 
theoretical interest to note from Table 5.2 the number of conditions that must be 
satisfied by sets of variables for their correlation matrices to attain reduced rank, 
they cannot be put to practical use beyond an expected rank of two. 

5.4. Determination of Communality from Approximate Rank 

The problem of determining communalities classically has been put in the form: 
How much can the rank of the n th order correlation matrix be reduced by a suitable 
choice of diagonal values (communalities)? If m is the rank of the reduced correlation 
matrix then m is the smallest number of common factors necessary to account for 
the intercorrelations (Theorem 4.6); and if m can be reduced by the choice of com¬ 
munalities, then the number of required factors can be reduced and greater parsimony 
achieved.* 

The concern in this section is with the calculation of communalities when some 
approximation to the rank of the reduced correlation matrix can be made. Such 
assumptions about the rank are also made in two methods of factor analysis (see 
chaps. 9 and 10); not, however, for purposes of computing communalities directly 
but rather as requirements of the particular methods, with the communalities coming 
out as a by-product. For the problem at hand, the reduced correlation matrix may 

* As pointed out by Lederman [334], even when the rank of the correlation matrix is fixed 
the communalities may not be uniquely determined. 


(5.20) 
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be expressed in the form 

( 5 -21) R = R 0 + H 

where R 0 is the n th order correlation matrix but with diagonal elements all zero, 
and H is a diagonal matrix whose elements h) (the communalities) are to be deter¬ 
mined. The algebraic problem to be solved is the determination of H such that R 
will have the minimum rank pi. 

A more restrictive aspect of this problem, and the one of greatest interest to factor 
analysts, is the determination of “rank” when only off-diagonal elements of R are 
permitted in the minors. In other words, consider the “rank” to be in if the largest 
order of a non-vanishing minor of R 0 is obtained by a selection of m of its rows and 
m different columns. This value m has been called the ideal rank by Albert [8], who 
obtained an algebraic solution for the communality problem for the case where 
H = m. In another paper [9], Albert proves that the solution produces unique com¬ 
munalities. Of course, in considering only minors with off-diagonal elements of R, 
it follows that a fundamental condition underlying Albert’s conclusions is that 

(5.22) m < I, 

2 

Now, this exact mathematical condition will almost certainly never be met with 
empirical data. Thus, while Albert provides an exact algebraic solution for the 
communalities when the ideal rank is known and is less than nj 2, it is a solution in 
theory only. 

The idea of determining the communalities from the known rank of the correlation 
matrix can be exploited even when the rank is not known precisely. By approximating 
the rank of the matrix, the unknown communalities may be computed from the 
conditions imposed on the correlations to attain such rank, as indicated in the last 
section. A rough estimate of the rank of the correlation matrix is given by the number 
of distinct groups of variables. Of course, this is not a mathematically defined concept, 
but in any empirical study the investigator will have some idea of the grouping of 
variables—either from the descriptive content of the variables or from a preliminary 
study of the correlations themselves. A simple statistic on the grouping of variables 
(the coefficient of belonging”) is presented in 7 . 4 . In any event, once the rank is 
approximated, the method of the preceding section may be applied to check whether 
the correlations satisfy the necessary conditions for the assumed rank. If the conditions 
are satisfied then the communalities are given by the explicit formulas (5.11) for rank 
one and (5.19) for rank two, while direct computing formulas for higher rank are 
not very practical. 

The actual process of determining the communalities involves a number of 
evaluations for each variable under the assumed rank of the matrix. The consistency 
of such values serves as a check on the rank, and their average is taken as the 
particular communality. Similar determinations are made for each variable of the 
set, and the assumed rank must check for all such determinations of communality. 
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Of course, the correlations need not satisfy the conditions for a given rank exactly 
with actual data because allowance must be made for chance errors.* It is suggested 
that for rank one all possible triads be written in the calculation of a particular com- 
munality. If the variation among these values can reasonably be assigned to chance 
fluctuations, the mean value may be taken as the communality. 

In the case of rank two, all possible expressions (5.19) for the determination of 
each communality could be considered. Before averaging, however, those based 
upon insignificant denominators would need to be rejected. The variables yielding 
insignificant tetrads for the denominator of (5.19) can be identified when the design 
of the variables is known, and there are but two groups. In such a case each group 
of variables will approximate rank one, and the tetrads involving three variables of 
such a group will be insignificant. Knowing the combinations of variables which 
produce insignificant denominators, it is not necessary to consider the expressions 
(5.19) which involve them. The denominators should include two variables of each 
group, considerably reducing the total number of expressions for each communality. 

When the rank of a correlation matrix is assumed, and the determination of the 
communalities is attempted, it may happen that some of the values exceed unity. 
Of course, such values of the communalities are not permitted, and they indicate 
that the particular rank assumed is inexact. Before the hypothesis of the specified 
rank is discarded, however, a number of evaluations of communality should be 
attempted. If, in general, several consistent values for each communality can be ob¬ 
tained, they should be averaged for the best determination of the communality. The 
justification for this procedure lies in the fact that the observed correlations are 
themselves subject to error, and the values to be supplied in the diagonal of the 
correlation matrix to produce a specified rank can only be expected to satisfy this 
hypothesis approximately. The final check lies in the agreement of the reproduced 
correlations from the solution employing these communalities, with the observed 
correlations. If the final residuals are insignificant, then the choice of the com¬ 
munalities is satisfactory. 

Strictly speaking, the considerations in this section apply only when the rank of 
the correlation matrix is known, and when sampling errors and rounding-off errors 
in computations are disregarded. In practice, the sampling errors to which the correla¬ 
tion coefficients are subject will cause the correlation matrix generally to have a rank 
equal to its order. However, it is not the exact mathematical solution of such a matrix 
that is of interest to a factor analyst. Rather, it is the problem of analyzing the experi¬ 
mentally determined correlation coefficients in such a manner as to make tfre resulting 
discrepancies (residuals) insignificant in the statistical sense. Formally, the problem 
is to find a reproduced correlation matrix R + of minimal rank whose elements differ 
insignificantly from those in the (observed) reduced correlation matrix R. Opera¬ 
tionally this begs the question because the determination of the statistical significance 
of residuals poses problems as great as those inherent in the determination of com- 

* No sampling error formula is known for the general expression of a communality computed 
from a higher-order determinant set equal to zero. An approximation to the standard error of a 
triad was obtained by Holzinger and Harman [243, sec 6.5]. 
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munalities. Statistical tests for the significance of residuals have been provided by 
Lawley [320] and Rao [394] employing the method of maximum likelihood, and the 
procedure is given in chapter 10. 

To illustrate the foregoing procedure, some empirical data from Mullen [375] are 
employed. A set of eight physical variables were selected which were made up of two 
distinct groups. As will be noted from their correlations in Table 5.3, the first four 
variables are measures of “lankiness” while the last four are measures of “stockiness”. 
Rank two is assumed for the correlation matrix, and checked in the process of ob¬ 
taining the communalities. 


Table 5.3 

Correlations Among Eight Physical Variables for 305 Girls 


Variable 

1 


3 

n 


m 

m 

8 

1. Height 

_ 

_ 

_ 

_ 

_ 



_ 

2. Arm span 

.846 

— 

— 

— 

— 

— 


— 

3. Length of forearm 

.805 

.881 

— 

— 

— 

— 


— 

4. Length of lower leg 

.859 


.801 

— 

— 

— 


— 

5. Weight 

.473 


.380 

.436 

— 

— 


— 

6 . Bitrochanteric diameter 

.398 

.326 

.319 

.329 

.762 

— 


— 

7. Chest girth 

.301 

.277 

.237 

.327 

.730 

.583 

— 

— 

8 . Chest width 

.382 

.415 

.345 

.365 

.629 

.577 

.539 

— 


Assuming rank two, the communality of any variable z e can be obtained by 
averaging a number of evaluations (5.19). The calculation of such expressions can 
be facilitated by systematically considering the four indices a , b, c, d which determine 
the denominator of (5.19). In calculating the communality for variable e = 1, for 
example, the indices ab are taken to be 23, 24, and 34 (from the first group of four 
variables) and the indices cd are taken to be 56, 57, 58, 67, 68, and 78 (from the second 
group of four variables). When these pairs of variables are considered in every com¬ 
bination, 18 separate denominators are determined and hence 18 separate values of 
h\. While 18 additional values can be obtained by interchanging the variables in 
only one of the pairs ab or cd, that was not done since the original values were deemed 
to be sufficiently consistent. Thus, 18 evaluations of the communality for each of the 
eight variables are obtained, and the mean values are taken as their best estimates. 
The resulting communalities are given in Table 5.4. 

No exact standard for judging the consistency of the separate values is available, 
but the following simple guide may be used. The calculated communality may be 
regarded as a variance, and the usual formula for the standard error of a variance 
applied to it, namely, 

(5.23) (V = o 2 ^2/N. 

If the variations of a set of values for a communality from their mean can be shown 
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to be insignificant by use of formula (5.23), they would also be insignificant by a 
more accurate test. The standard errors for the eight communalities, as given by the 
above formula, are presented in Table 5.4. The maximum variation from the mean 
of the 18 separate values for each variable does not exceed 1.5 times the standard 
error, demonstrating the consistency of these values and justifying the assumed 
rank and the determinations of the communalities. 

Table 5.4 

Communalities for Eight Physical Variables 


Statistic 

Communality 
Standard error 



The direct application of the foregoing method of computing communalities is 
practical when m is one or two. For a larger number of factors the direct procedure 
becomes too cumbersome with the use of a desk calculator. While it is conceivable 
that the direct methods, involving the calculation of determinants of high order, 
might be feasible on electronic computers, it appears to be more economical to 
employ special techniques of factor analysis that do not require estimates of com¬ 
munality in advance but produce communalities as by-products (see chaps. 9 and 
10 ). 


5.5. Theoretical Solution for Communality 

Before considering practical means of approximating the communalities, a pre¬ 
sentation is made in this section of the theoretical solution to the problem without 
explicit use of the rank of R. Employing the fundamental factor model (2.9) and the 
notation introduced in 4.10, any variable Zj can be represented in the total-factor 
space of m common factors and n unique factors as follows: 

(5.24) z'j — a jl F l + aj 2 F 2 + • • • + + djUj, 

and in the common-factor space this becomes: 


(5.25) z] = a jl F i + a j2 F 2 + • • • + a jm F m . 

The correlation between a variable and its “common” part is 


■ a j2 -r 
N( z 'j)N(zj) 


1 -hj r 


where the lengths of the vectors in the total-factor space and in the common-factor 
space are given in Table 4.1. But the multiple correlation of variable Zj with the 
linear combination of the m factors (represented by z'j) is defined as the simple 
correlation coefficient between them, namely, 
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Then, since z'- is the mathematical model of zj, the expression (5.26) may be put 
in the form 


(5.28) h] = Rl rFxF2 ... Fm . 

which states that the communality of a variable is equal to the squared multiple 
correlation of that variable with the common factors. 

While this result has been known since 1936 [403, 107, 172], it was not until 
recently [289, 292, 484] that a suggestion was made to adapt it for use in approxi¬ 
mating the communalities. Since the zj as well as the F p span the common-factor 
space, it follows from (5.25) and (5.28) that 


(5.29) 


— /? ^ 
n j - 


Kaiser [289, p. 5] argues that if the n vectors zj are to lie in the common-factor space 
of m dimensions, then some of them will be redundant for computing the squared 
multiple correlation in (5.29). Since it is not necessary to include z'j in the regression 
equation for hj, this equation becomes 


(5.30) 


__ D 2 

n j iX Zj-z'iz'i • • 


This states that the communality of a given variable is given by the squared multiple 
correlation of that variable on the n — 1 common parts of the remaining variables. 
The obvious difficulty of this apparent solution is that, in effect, the communalities 
of the n — 1 remaining variables must be known in order to compute the squared 
multiple correlation, i.e., the desired communality. In an attempt to obtain an 
approximation to the n communalities, Kaiser proposed an iterative procedure for 
use with an electronic computer. Unfortunately, the process converges only for 
restricted matrices and he has concluded that the method has .. no practical use 
because of its inability to solve the communality problem for empirical matrices” 
[292, p. 10]. 

There is hope that the iterative approximations to the theoretical solution may 
yet prove practical. Guttman [182] proposes a generalization and improvement of 
the preliminary procedure suggested by Kaiser [289]. This method is designed to 
solve for the diagonal matrix H with proper communalities* without requiring 
a priori approximations to the rank of R nor the extraction of common factors. The 
final solution is obtained by an iterative process in which trial values H, eventually 
converge (for conditions, see [182, pp. 5-7]) to the desired matrix of communalities H. 

It is convenient to designate the successive expressions for the reduced correlation 
matrix by R t , so that equation (5.21) becomes 

(5.31) R t = R 0 + H t (t = 1,2,...) 


for any approximation H, to the communalities. The iterative procedure involves 
the calculation of the inverse of R t at each stage; and the diagonal matrix with the 


* As noted in 3.2, paragraph 15, communality estimates are considered proper if they preserve 
the Gramian properties of the correlation matrix. 
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principal diagonal of R r _1 is designated D r . Then, an entire class of iterative pro¬ 
cedures is given [182, p. 2] by the following recurrence formula: 

(5.32) H t+1 =H, —dV 1 , (t — 1,2,---) 

where e is some positive number. The iterative procedures implied by (5.32) differ 
from one another according to the choice of e and the initial approximation 
The specific iterative process which Kaiser [289] employed is (5.32) with e = 1 
and = I. As noted above, this process generally does not converge for empirical 
data. Guttman [182] makes several recommendations, the principal one being to 
take e = j and H x as the diagonal matrix of squared multiple correlations of each 
variable with the n — 1 remaining variables (further use of this concept is made in 
5.7). While this work seems promising, the real proof of the usefulness of these pro¬ 
cedures must await the test of time. 

From the preceding discussion it becomes apparent that there is no easy road to a 
solution of the communality problem. In the following sections, several practical 
avenues are suggested. 

5.6. Arbitrary Approximations to Communality 

Literally dozens of methods for estimating communalities have been proposed, 
but none of them has been shown to be superior to any of the others on the basis of 
closer approximation to the “true” values. As a matter of fact none of the methods 
has been demonstrated to lead to minimal rank of the correlation matrix. The choice 
among the various methods of approximation is generally made on the basis of avail¬ 
able computational facilities and the disposition of the investigator to employ that 
method which intuitively seems best to approach the concept of communality. As a 
saving grace, there is much evidence in the literature that for all but very small sets 
of variables, the resulting factorial solutions are little affected by the particular 
choice of “communalities” in the principal diagonal of the correlation matrix. 

Of the many arbitrary estimates of communality, the simplest is the highest 
correlation of a given variable Zj from among its correlations with all the other 
variables of the given set. This simple procedure has been used by Thurstone and 
his followers in a large number of studies involving the centroid solution (see 8.9). 
They found this simple method of estimating communalities useful for large correla¬ 
tion matrices, but would not recommend it for small numbers of variables. 

Another method for approximating the communality of a variable z- } is to employ 
a triad, i.e., 

(5-33) h) = r ik r n !r u , 

where k and / are the two variables which correlate highest with the given variable. 
It can readily be seen that this formula would have the effect of moderating an 
extremely high, exceptional correlation. Still another estimate is given by the average 
of all the correlations of a given variable with each of the remaining ones, viz., 

(5.34) h) = £ r jk /(n - 1) (k A j). 

k= 1 
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A more satisfactory procedure for estimating communality follows from 5.4. The 
approximate rank of the correlation matrix is assumed, making use of the groupings 
of variables, in order to get suitable estimates of the communality. This method 
involves the calculation for each communality of the average of expressions like 
(5.11) for m = 1 and (5.19) for m = 2, where m is the assumed rank of the correlation 
matrix. For small sets of variables it is expected that there will be only a few factors 
(see 5.2). It may then be sufficiently accurate to assume rank one or two and employ 
the method of 5.4 for calculating approximations to the communalities. In practice, 
when the rank exceeds two, this method is not feasible because of the complexity 
of the formulas and the computational labor. 

One simplification of this technique is accomplished by sectioning the correlation 
matrix into sub-groups of p variables, each approximately of rank one. The unit-rank 
estimates of the communalities may then be calculated by means of the formula: 


(5.35) 


h 2 


E r jk r jl/ r kh 


(k, l ^ j), 


where v = | ^ j is the total number of different triads that can be arranged for 

the given variable Zj out of the subset of p variables which comprise the section of the 
correlation matrix of unit rank. This procedure seems much more satisfactory than 
estimating the communality by a single triad as in (5.33). 


5.7. Complete Approximations to Communality 

It may be convenient to distinguish the foregoing arbitrary estimates of com¬ 
munality, which generally involve only a few correlations, from estimates which are 
based upon the entire correlation matrix. The latter will be designated as “complete 
approximations” to communality. One such method might be the direct computation 
(best determination rather than estimate) of communalities by the method of 5.4. 
This method involves an approximation to the rank, say m, and then taking as hf the 
average of all evaluations, from the (m + l)-order determinants set equal to zero. 
Such a procedure is not feasible for m > 2, but perhaps some day the rapid selection 
of minors and the solution of such determinants will be programmed for high-speed 
electronic computers. Albert’s algebraic solution referred to in 5.4 and the methods 
discussed in 5.5 are theoretical rather than practical instances of complete approxima¬ 
tions to communality. 

Perhaps the simplest procedure that would fall under the heading of “complete 
approximation” to communality involves, in essence, the calculation of the first 
centroid factor (see 8.9). As a start, the highest correlation for each variable is inserted 
in the principal diagonal of the correlation matrix. Then the estimate of each com¬ 
munality is the ratio of the square of the column sum to the total sum of all the 
correlations in the matrix, i.e., 

(536) *J-(t rfj £ £ r H , 

U = 1 / / k= 1 1= 1 
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where the self correlation is the highest correlation of the given variable with all 
other variables. While this formula does employ all correlations in the matrix, it will, 
nonetheless, tend to give an underestimate of the communality since it results from 
the employment of only one centroid factor. 

A method similar to the preceding, but using the average instead of the highest 
correlation, leads to the following formula: 


(5.37) 


h) = [n/(n - 1)] 


n 


Z rjk 

,n= i 


2 


n n 

Z Z •> 

k = 1 1=1 


(k ^ j and k ^ /). 


in which the diagonal values are actually ignored. This formula employs the square 
of the coefficient of the first averoid factor [243, p. 193], Just as in the case of formula 
(5.36), so also will formula (5.37) tend to give underestimates of the communalities. 

Communality estimates may be calculated by use of the component analysis model 
(2.8), and then a conventional factor analysis solution in terms of the 'model (2.9) 
may be obtained. While this may sound involved, it is actually quite practical with 
electronic computers. Thus, a correlation matrix (with ones in the diagonal) of 
order n is analyzed into n principal components and the associated n eigenvalues 
(see chap. 8). It is then assumed that the dimension of the common-factor space is 
equal to the number of principal components for which the eigenvalues are greater 
than one [296, pp. 144-46]. The variance contributed (by this reduced number of 
principal components) to each variable is taken as the estimate of its communality. 

In consideration of the indefiniteness of the communalities—the evasiveness of a 
fine hypothetical concept it becomes very attractive to employ some objective 
measures even though they allegedly are not “communalities.” It has been argued, 
and substantiated by empirical evidence, that it matters little what values are placed 
in the principal diagonal of the correlation matrix when the number of variables 
is large (say, n > 20). In such a case the actual arithmetic impact of these few diagonal 
values in relation to the many numbers off the diagonal is so small that the factorial 
results are not affected very much. Of course, this is a specious argurpent which 
would not stand up under a rigorous statistical test were one available. Nonetheless, 
the desire to employ well-defined, objective measures for the diagonal entries has 
led to some interesting procedures. 

One method for estimating communality which has the semblance of being 
objective involves iteration by refactoring. This procedure would be limited to small 
matrices if the work were to be done on a desk calculator, but certainly would be 
suitable for any matrix if a high-speed electronic computer were available. The 
operation is initiated with unities or zeros or any other values in the principal diagonal 
and deciding a priori on the number of factors. Typically, such a procedure would 
involve: (a) the calculation of a principal-factor solution (see chap. 8), which is most 
readily adaptable to an electronic computer; (b) the determination of the sum of 
squares of factor coefficients (for the predetermined number of factors) as new 
estimates of communality; (c) the calculation of another principal-factor solution; 
and (d) repeating this process as many times as necessary until the recomputed 
diagonal values do not change from the preceding set. 
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Wrigley [538] employed such iteration-by-refactoring methods in an excellent 
empirical study, comparing some fifteen different methods of initial estimation of 
communality. For five methods of estimation, 100 iterations were made involving 
very extensive calculations. Although there is no formal proof of the convergence 
of the iterative process, this study would seem to indicate that it does. Convergence 
was found to be generally more rapid for low estimates than for high, with unities 
being especially poor as a starting point for finding the communalities. Wrigley 
concludes [538, p. 26] that for work with a desk calculator the most effective method 
is a modification of the arbitrary estimates of the highest correlation (for variables 
with high correlations the estimates are a little higher than the highest correlations, 
and for variables with low correlations the estimates are a little lower than the 
highest correlations). For work with an electronic computer, the squared multiple 
correlation of each variable with the remaining ones was found to be the best starting 
point. 

The squared multiple correlation (SMC) of each variable with the remaining n — 1 
observed variables has much to recommend it as an approximation to communality, 
quite aside from the foregoing convergence property. Wrigley [535] suggests that the 
SMC’s be called the “observed communalities” since they, and not the theoretical 
minimal-rank communalities, measure predictable common variance among the 
observed correlations. These values are certainly objective and they can be deter¬ 
mined uniquely, without iteration, by use of the square root method (see 3.4) for a 
small set of variables or by means of an electronic computer for a large set. To obtain 
the SMCs for all the variables in a set, it is expeditious to calculate the inverse R 1 
of the correlation matrix R (with unities in the diagonal), and then the SMC for 
variable z- } is given by 

„ 1 

(5.38) SMCj = Rjia-uc- = 1 “ Jp 

where r jj is the diagonal element in R~ 1 corresponding to variable z y These values 
are frequently calculated as an incidental step in the computer program for a principal- 
factor solution (see 8.7). 

The SMC has another very important property—-it is the lower bound for the 
communality, i.e., 

(5.39) 

where (n - 1) stands for the set of variables excluding z y This property was first 
given by Roff [403], and explicitly proven by Dwyer [107]. Largely because of this 
property, Guttman [181] recommends the SMC as the “best possible” estimate of 
communality. The characteristics that make the SMC s the “best possible” estimates 
include: (1) actual equality in (5.39) for many correlation matrices for which minimum 
rank m is attained; and (2) the tendency of the SM C’s to approach the communalities 
if the ratio of m to n tends to zero as n approaches infinity. 

The insertion of SMCs in the principal diagonal produces a reduced correlation 
matrix which is not Gramian, in general. While the SMC’s are not proper estimates 
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5.7 


, p i:s“ ;'o ■' “ ••'"■ “ 

Since some approximation is made to the communalities in conventional factor 

fromthe “ a . dvantage in kno ™Sthe direction of the error 

™ h va ^ es ' 7 he SMC s are known t0 b e less than (or at most equal to) the 
minimalities. This logical consideration, together with the preceding properties 
strongly recommend the SMC’s as the most desirable approximates to com 
munaht.es when su.table computing facilities are available. Fortunate^* impa” 
of poor estimates is less relevant for large matrices, so that one may res "rbtoaTv 
approx.mat.ons .f he does no. have access to an electronic computer ^ the S 
hand, the work of computing the SMC’ s is not so great for small matrices, where the 
approximations become much more important 

An implication of the use of SMC’s is that the distinction between common and 
umque variance as mdicated by the factor model (2.9) is a function of the narUcular 

v"l A n nv er S MC idera “ 0n rath6r than hyPOthefal domain^or* unWerse of 
variables. Any SMC measures variance common to the particular variable and the 

remaining „ - 1 variables selected for study, while the communa ity measures 

vanance common to the variable and the factors resulting from the set o „ v”dables 

.should be recogmzed that in either event, the distinction between common and 

spectre vanance is relative to the particular set of variables. Wrigley challenges the 

and eoncMe?ml'‘n %'“ST T™ ° f redudng the number of favors, 

nd concludes [535, p. 96]. The principal claim of communalities rests on the fact 

IbiS The; vlholo' 'T T ^ ^ ^ t0 the correlations^actual 
observed. The psychological advantages, however, appear to rest with the squared 

mu tiple correlations, which measure the proportion of common va^n^nanv 

particular test selection. ’ It is interesting to note that communalities with the 

( P see P c t y p J 9 U ) men “ 0ned ^ Wrig ‘ ey ^ th ° Se d6,ermined * the “i-eslltion 6 

When factors are extracted one at a time, the diagonal values in each residual 
correlation matrix may be retained as computed or estimated anew. In general when 
one of the ‘arbitrary approximations” to communality are employed thTvalutta 
the principal diagonal of each residual matrix should be re-estimated bv the method 
used originally. For example, when the highest correlation fo e”ch™riarte “ 
as its communality, the highest value in each column would again be used for the 

ctrdal e ns‘ ry ,“:h Pl r °/ ““ ValUe aCtUa11 ? COm P uted > in ^ matrix of rlsWua! 
correlations with the first factor removed. 

Dlo^ed th thr!l, e !i r o han f’ W i he " “ COmplete approximations” to communality are on¬ 
to! d t dla « ona, . va lues are not altered in the succeeding residual matrices 

exeep, point where a11 the diagonai vaiues 
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with the use of electronic computers, all factors are extracted at one tune. Hence it 
is very important when a method requires estimates of communality (rather than 
estimate of the number of common factors), that “complete approximations be 

employed. 

5.8. Examples of Approximations to Communality 

In order to illustrate the approximation methods of the last two sections six 
hypothetical variables are employed and their correlations are given m Table 5 5. 
Because this is a “textbook example ,” with no sampling or rounding errors in the 
data, it is possible to obtain an exact algebraic solution for the communahties from 
a knowledge of the rank of the correlation matrix. Even a casual observation of he 
correlations in Table 5.5 indicates a clustering of the first three variables whi 
remaining ones correlate no more amongst themselves than with the first th 
variables. It is then reasonable to assume a rank of 2 for the correlation matrix, an 
to test this hypothesis by the method of 5.3. 

Table 5.5 

Correlations Among Six Hypothetical Variables 


Variable 


1 

2 

3 

.72 

.75 

.78 


.49 

.42 

.35 

.42 

.36 

.30 

.28 

.24 

.20 


According to Table 5.2 there are four necessary conditions that the correlations 
mu* satisfy in order that the six variables be describable in terms of two common 
factors. These conditions are arrived at by equating the right-hand members of (5.17). 
The first three of these expressions are identically equal to .74, while the last tw 
are indeterminate.t The necessary conditions among the correlations are satis e 
mathematically, and the true rank of the reduced correlation matrix is; twoThe 
communality of each of the six variables is obtained by means of equation (5. ), 

with the several different evaluations for each variable being identical in this hypo¬ 
thetical example. These results are labelled “Actual” in Table 5.6. Following these 
actual communahties, the different approximations of the two preceding sections 
are presented in the successive columns of this table. It is only for heuristic reasons 

i™ s )rdttThe"aUhe IS tometf— iables 4, 5, 6 and any one of 12, or 3 
vanish 'as ^consequence \he minors in the numerators of these expressions also vanish It ts 
thus apparent that four such variables are describable in terms of only one common facto . 
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that the several approximations to the communalities are summarized in a single 
table, rather than to imply any advantage of a particular estimate over anot er. 


Table 5.6 

Communality Estimates for Six Hypothetical Variables 


Variable 

Actual 

Highest 

r 

Triad 

(5.33) 

Average 

(5.34) 

Unit-Rank 

(5.35) 

Centroid 

(5.36) 

Averoid 

(5.37) 

SMC 

(5.38) 

1 

2 

3 

4 

5 

6 

.74 

.72 

.89 

.49 

.36 

.16 

.75 

.78 

.78 

.49 

.42 

.28 

.69 

.75 

.81 

.39* 

.36 

.16 

.53 

.50 

.48 

.39 

.35 

.25 

.69 

.75 

.81 

.49 

.36 

.16 

.73 

.68 

.62 

.38 

.29 

.14 

.68 

.61 

.54 

.37 

.29 

.14 

.66 

.66 

.69 

.32 

.25 

.12 


* Average of the two possible values: .29 and .49 


The procedure for obtaining the numerical values for each method wi ein tcae . 
Of course, the method of highest correlation is self evident. For the triad met o , 
formula (5.33) is employed, and for the first variable this becomes 

h\ = r 13 r 12 /r 23 = (,75)(.72)/.78 = .69, 

because variables 2 and 3 correlate highest with it. A little ambig f y ”. aPP ^ 
ing this method to variable 4. The highest correlation is r 41 = .49 but here is a 
fof the second highest, viz., r 42 = .42 and r 45 = .42. Either of the following two 

determinations are appropriate: 




. .v.-h 


fdQV4T>/72 = .29. 


or 


K = r 4l r 45 /r l5 = (.49)(.42)/.42 - .49, 


and the average of these values is recorded in Table 5.6. . 

Approximations to the communalities by the average of all correlations of a give 
variable with the others are obtained by formula (5.34). While it would appear from 
this hypothetical example that the average correlation tends to give too low an 
estimate of communality, at least in one instance (variable 6) it is considerably 

higher than the actual value. . ' , _ n 

It was suggested in 5.6 that the communalities could be approximated from an 
assumed rank for the correlation matrix derived from the more-or-less natura 
groupings of variables. The simplest instance is to consider the rank to be equal to 
the number of groups, and the section of the correlation matrix corresponding to each 
group is assumed to be of rank one. The six hypothetical variables fall “ ° two 
groups' G, = 1,2,3 and G 2 = 4, 5,6, and the total matrix is assumed to be of rank 2. 
fn this ease, each unit-rank subgroup contains p = 3 variables and the number of 
different triads for each variable is simply v = 1. Formula (5.35) then simplifies to a 
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in “■ °" rt 

h l = r A5 r^ 6 /r 56 = (.42)(.28)/.24 = .49, 

which involves different correlations than in either of the two calculations by the 
single triad method, yet yields the same result as for one of them 

beJn Ur„ld°to Thk 11311 ' 16 , 8 Tt, d f ° Ur inS , tanCeS ° f arbitrary a PP rox ‘ ma ‘ions have 
ifim.mSf f 7 n P 01nt Three complete approximation methods will now be 
strated. For the first variable of the example, the centroid formula (5.36) yields: 

= Z r n / Z Z r u = (3.41)716.00 = .73, 

\fc= 1 / / k= i /= i 

The aneroid formula (5.37) produces: 


*i = |( i «•» 

D \k = 2 


Z Z r ki = -(2.66)712.50 = .68. 

k = 2 1=2 5 


general principle noted above, that the centroid and averoid formulas tend to 
give under-estimates of communalities, is borne out by the values for the six hypo- 
thetica variables as compared with their “actual” communalities. 
f y ’ l . ® comi ^ u nahty is estimated by the squared multiple correlation (SMC) 
of each variable with the remaining five variables. The actual process of inverting a 

SampleThe Mr SqUare r ° 0t ^ iUuStrated in Table 33 usin « ‘he current 

i P r ? . SMC S are com P uted according to formula (5.38) from the diagonal 
values of the inverse matrix, and the results appear in the last column of Table 5 6 
The property of the SMC as a lower bound to the communality is clearly evident 
by comparison with the “actual” values. * 

The reader is cautioned not to make unwarranted generalizations about the 

for exWhitTnv ft 7 ™ nOUS methods for estimating communality. The only purpose 
for exhibiting the different approximations side by side in Table 5.6 is to illustrate 

data P ar CedUr p S deSCr 7 d ““ earHer sections ' >* should be remembered that the 
data are small m number, artificial, and devoid of sampling errors. While the pro- 

choice^f method^" 8 SCTera ‘ ^ ” Tab ' e 5 6 may be instructi ™. ‘he actual 
choice of method for approximating communality should be based on the con¬ 
siderations set forth above rather than the apparent superiority of any of“he 

numerical values over the others. y y 

5.9. Direct Factor Solution 

From the general theory of factor analysis developed so far, it is possible to obtain 

rema nde S r 0 “ ‘.hlt^’ 10 formal procedures treated in the 

rema nder of this text. For a small number of variables a very simple factor model 

might be appropriate. Then, from a knowledge of the correlations and approximations 
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to the communalities, a complete factor pattern may be determined. In a practical 
situation, with any sizable number of variables, rather involved mathettiatical and 
computational methods are required to get a factor solution. 

A direct factor solution for the hypothetical example of the last section will be 
obtained. As noted above, the correlations in Table 5.5 indicate a strong clustering 
of the first three variables but not so for the remaining three. A plausible model* 
is presented in Table 5.7, where uncorrelated factors are assumed for convenience. 


Table 5.7 

Factor Pattern Plan* 


Variable 

To 

Fi 

1 

a io 

«u 

2 

a 20 

a 21 

3 

a 30 

a 3 i 

4 

a 40 


5 

a 50 


6 

a (>0 



* A factor pattern will usually be presented in such a tabular 
form with the coefficients of the respective factors appearing 
in the columns headed by the factors. 


From the correlations in Table 5.5 and the communalities determined in the last 
section (“Actual” in Table 5.6), all the coefficients in the factor pattern can be 
obtained. Since the last three variables involve only one common factor, each of 
their communalities is merely the square of the coefficient of this factor, i.e., 

h) = a% {j = 4, 5,6). 

Solving for these factor coefficients yields: 


a 4-o — \Z-49 — -7, a 50 — yj .36 — .6, 


a 60 = \/^6 = .4. 


Next, the coefficients of F 0 for the first three variables can be obtained from the 
following formula for the reproduced correlations: 


r'jk = a jo a ko (j = 1,2, 3; k = 4, 5,6), 

which is a simplification of (2.27) for the special factor model of Table 5.7. In this 
formula the a ko are known and if the observed correlations are employed in place 
of the reproduced correlations, the unknown a j0 can be determined. While there are 
three possible determinations for each j, and an average might ordinarily be taken, 


* The pattern plan is plausible because the correlations among the variables 1, 2, 3 are higher 
than those among 4, 5,6. Such a plan is consistent with the hypothesis that an extra factor should 
be postulated for a group of variables with higher intercorrelations. 
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the three values are identical in the present case. These coefficients follow: 

#10 — *7, #20 = '6? #30 = .5. 

Finally, the coefficients of F t can be obtained from the following expression for 
the communality: 

hi = a% + aji (j = U 2, 3), 

or, explicitly: 

ciji = \fhj O' 1? 2,3), 

where the communality and the coefficient of F 0 are now known for the first three 
variables. The resulting coefficients are: 

an = V- 74 - A9 = - 5 > #21 = x/-72 - .36 = .6, # 31 = ^.89 - .25 - .8. 

The complete factor solution is summarized in Table 5.8. 

Table 5.8 

Factor Pattern for Six Hypothetical Variables 


Variable 

To 

Fi 

1 

.1 

.5 

2 

.6 

.6 

3 

.5 

.8 

4 

.7 


5 

.6 


6 

.4 



While it was tacitly assumed that the residuals were precisely zero, some dis¬ 
crepancies between original and reproduced correlations would be expected with 
empirical data. The extent of agreement between these values is the measure of 
adequacy of a factor solution, the particular form of the solution being somewhat 
arbitrary. Thus, in the present example the particular factor model of Table 5.7 was 
postulated, but some other model might fit the data equally well.* Some standards 
for making a choice among the possible factor solutions are considered in the next 
chapter. 


* See, for example, ex. 9, chap. 8. 
92 



6 

Properties of Different Types of Factor 
Solutions 


6.1. Introduction 

Before leaving the foundations of factor analysis, a preview of what lies ahead 
wil be presented in this chapter. Some simple, but fundamental, criteria are set 
forth explicitly; and in the subsequent enumeration of the popular factor models, 
the extent of adherence to these criteria is indicated in each case. This chapter serves 
to summarize the most popular types of factor solutions. It gives a brief overview 
of their properties and the distinguishing characteristics among them, but only 
schematic solutions are presented; formulas for the computation of the factor 
coefficients are not given in this chapter. The methods employed in obtaining 
numerical solutions are developed and illustrated in detail in parts ii and iii. 

As noted several times before, there is a basic indeterminacy in factor analysis 

an infinitude of factorizations of a correlation matrix may account for the observed 
data equally well. The realization of this fact posed a perplexing problem to factor 
analysts during the rapid growth of the subject in the 1930’s and 1940’s: flow can a 
factor solution be determined which will be acceptable to other workers? While the 
principal-factor solution (chap. 8) is unique in the mathematical sense, at least to 
psychologists it is not acceptable as the final form. The search for a “psychologically 
meaningful” solution, which could be obtained by rotation from some arbitrary 
factorization of the correlation matrix, led Thurstone to formulate the concept of 
“simple structure” [468, chap. VI; 477, chap. XIV], which is discussed in 6.2 under 
criterion 10. 

Before enumerating some of the principles which have been proposed to objectify 
the particular choice of factors, a brief review of the problem will be made in geometric 
terms. As pointed out in chapter 4, the observations on a set of variables can be 
regarded as determining a number of vectors (corresponding to the variables) in a 
space equal to the number of subjects. By methods of factor analysis these vectors 
can generally be contained in a space of smaller dimension than the number of 
variables. The coordinate axes of this reduced space are the common factors, and the 
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original variables can be expressed linearly in terms of these factors. The determina¬ 
tion of this common-factor space is in no way dependent on the particular coordinate 
frame of reference employed. This arbitrariness is represented geometrically by the 
infinite number of rotations possible from one set of coordinate axes to another. 

For ease of mathematical description, and sometimes to facilitate psychological 
interpretation, it is common practice to change the frame of reference. In making 
such a transformation of coordinates it must be remembered that the geometric 
configuration, e.g., straight line or swarm of points, is left unaltered. The mathematical 
expressions or formulas describing the geometric configuration may change under 
transformation, but the configuration itself is invariant. 

The mathematician usually is concerned with the geometric configuration only, 
using the frame of reference as a tool, and will prefer one reference system to another 
if it yields a simpler (and more elegant) expression for his configuration. For example, 
the elaborate formula consisting of six terms: 

(6.1) AX 2 + BY 2 + CXY + DX + EY + F = 0 

represents a geometric configuration (an ellipse) in one (arbitrary) frame of reference, 
while the expression 


represents precisely the same configuration in another (arbitrary) reference frame 
selected so as to make the equation as simple as possible. 

Unlike the mathematician, the psychologist frequently concerns himself with the 
interpretation of the frame of reference, using the configuration of points merely as 
the vehicle to get to the particular reference axes. Thus, in factor analysis the geometric 
configuration is a swarm of points, each one representing a test and the density of 
the points being a function of the correlations among the tests. A frame of reference 
may be selected for psychological interpretation on the basis of the particular con¬ 
figuration of points but the emphasis in the resulting psychological theory is on the 

coordinate axes, not the configuration alone. 

The criteria listed in 6.2 provide the basis for the selection of the coordinate axes. 
Depending on how these standards are applied, and whether the empirical data 
satisfy the conditions, one or another of the solutions enumerated in 6.3 and 6.4 
may be obtained. These are referred to as direct solutions since they are obtained by 
direct calculation from the correlation matrix. In contrast, the multiple-factor solu¬ 
tions introduced in 6.5 are always derived solutions, resulting from a transformation 
of one of the direct solutions of 6.3 or 6.4. The computational methods for the direct 
solutions comprise part ii, and for the derived solutions part iii, of this text. Finally, 
in 6.6 a summary is presented of the assumptions, properties, and characteristics of 
the different types of factor solutions. 

6.2. Criteria for Choice of Factor Solutions 

There are certain assumptions that can be made, and restrictions imposed, that 
make it possible to describe a given matrix of correlations uniquely in terms of a 
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factorial reference system. However, such a solution is unique only in the sense of 
the particular criteria; and, selecting other criteria will generally lead to another 
unique solution. “Unique” is used here merely to denote the fact that any two 
workers accepting the same criteria and following the same procedures will arrive 
at an identical result for a given set of data. Some standards, or guides, which assist 
in defining and delineating the different types of solutions are presented in the 
following paragraphs. 

1. Factor model.—A linear model for the variables is assumed. While the principal 
components, or component analysis, model (2.8) is of some interest, the bulk of the 
text is concerned with the classical factor analysis model (2.9) or (2.35) in matrix 
notation. 

2. Principle of parsimony.— Following the principle common to all scientific 
theory, a law or model should be simpler than the data upon which it is based. Thus, 
the number of common factors should be less than the number of variables; and, in 
the linear description of each variable, the complexity should be low. 

3. Contributions of factors. —A distinction among different factor solutions may 
be made on the basis of the contribution (2.13) of each factor to the variances of the 
variables. One standard may require decreasing contributions, i.e., each successive 
factor contributing a decreasing amount to the total communality. Another standaid 
may require approximately level contributions of all factors. 

4. Grouping of variables. —In many methods of factor analysis, a notion of the 
clustering or grouping of the variables is implied. Sometimes the groups are only 
roughly approximated, as suggested in 5.4 for the purpose of estimating the rank 
of the correlation matrix. At other times more precise methods are employed for 
careful assignment of the variables to one group or another. Examples of the latter 
are the “coefficient of belonging” (see 7.4), and the method of “cluster analysis” 
developed by Tryon [483]. 



Fig. 6.1.—Intercorrelations exhibiting distinct groups of variables 
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If the variables of a study are re-numbered so that those that group together are 
in sequence then the table of intercorrelations may be represented schematically as 
in Figures 6.1 or 6.2. The plus signs represent correlations, the double plusses indicate 
even higher correlations among the variables constituting a group, and the zeros 
mean no correlation. In Figure 6.1 an hypothetical situation is depicted in which 
there are four well-pronounced groups of variables with no relationships between 
them. It is extremely unlikely that an empirical matrix of correlations would ever 
approach this simple form. On the other hand, it is to be expected that a set of 
relevant variables would be correlated throughout with higher correlations within 
groups, as indicated in Figure 6.2. 



Fig. 6.2.—Intercorrelations showing grouping of variables with relationships between groups 

5. Frame of reference.—A choice must be made between an orthogonal and an 
oblique reference system, i.e., whether the variables will be described in terms of 
uncorrelated or correlated common factors. The observed correlations are fitted 
equally well by solutions involving factors of either type. Generally it is more con¬ 
venient to start with an orthogonal solution even if the preferred solution is oblique, 
arriving at the latter by transformation from the former. 

6. Point representation: ellipsoidal fit.—In 4.9 two alternative modes of repre¬ 
senting variables geometrically are indicated. It will be recalled that in the point 
representation there is one point for each of the N individuals, referred to a system 
of n reference axes—one for each variable. The points which are plotted in this 
n-space are contained in a common-factor space of only m dimensions (from 
Theorems 4.4 and 4.6). The loci of the swarm of points of uniform frequency density 
are, more or less, concentric, similar, and similarly situated m-dimensional ellipsoids, 
being exactly so for a normally distributed population [547, chap. XII]. It then 
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seems natural to take the principal axes of these ellipsoids as the reference axes, 
and hence this standard is called ellipsoidal fit. 

7. Vector representation: linear fit.— In chapter 4 it was shown that the variables 
may be regarded as vectors in an N- space, and that the correlation between any two 
is given by the cosine of their angular separation or by the scalar product of their 
vectors projected into the common-factor space. A group of variables having high 
intercorrelations is encompassed by a “cone” with a relatively small generating 
angle. If a vector or reference axis of the common-factor space is chosen in the midst 
of this cone, all variables in the group will correlate high with it. The degree of linear 
fit is measured by the extent to which the vectors representing the variables approach 
the reference axis. By selecting a number of such reference axes, each one passing 
through a cone of vectors, an oblique solution will generally be obtained. 

8. Vector representation: planar fit— The preceding type of geometric fit also 
satisfies planar fit in the sense that the vectors lying close to the reference axes will 
also lie close to the planes of pairs of these axes. In general, the degree of planar fit is 
measured by the proximity of the vectors representing the variables in the common- 
factor space to such reference planes. Variables which satisfy planar fit need not 
satisfy linear fit, i.e., their vectors may lie in the plane between the two reference axes 
without clustering near each one. Thus, greater freedom exists in the choice of the 
factor axes when reference planes are selected rather than specific reference axes close 
to clusters of vectors. If a solution can be found in which all vectors lie in one or 
another of the reference planes, then each variable will involve only tw6 common 
factors in its linear description. 

9. Vector representation: hyperplanar fit.— The two preceding criteria employed 
either one-spaces or two-spaces to approximate subsets of vectors representing the 
variables. A natural extension of these ideas is to larger and larger linear spaces— 
just short of the total space which obviously would offer no simplification. By hyper¬ 
planar fit in a common-factor space of m dimensions is meant that each vector 
representing a variable lies in an (m — 1), or smaller, space. When a set of variables 
satisfies this standard, the complexity of any one of them is less than the total number 
of common factors. This does not appear as a very stringent criterion, because it is 
satisfied if each variable is merely of complexity (m — 1) for m common factors. 
However, the strength of this standard lies in the fact that the hyperplane is the 
largest permissible space containing each variable. In other words, it is presumed 
that there will usually be smaller reference spaces which contain certain subgroups 
of variables. In particular, a vector which lies on a reference axis is contained in that 
one-space, and the variable it represents has a complexity of one. Similarly, if a 
vector lies in a reference plane, the variable is of complexity two. It is evident that 
standards 7 and 8 may be considered as special cases of hyperplanar fit. For, if a set 
of variables satisfies the criterion of linear fit, or planar fit, it certainly conforms to 
hyperplanar fit. The converse, of course, is not true generally. 

10. Simple structure principles.— The criteria under consideration here are 
specifically intended for the multiple-factor solutions. Such solutions have been 
vaguely defined in the past, following some intuitive concepts of simple structure 
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proposed by Thurstone. He was among the first to strive for an objective definition 
of this concept and an accompanying objective procedure for a simple structure 
solution. Since 1935 many individuals have made specific proposals for analytical 
or semi-analytical procedures for the attainment of simple structure or approxima¬ 
tions to it. However, real strides in this direction were not realized until twenty years 
later. The accomplishments in analytical methods are described in chapters 14 
and 15. 

Thurstone’s original three conditions for simple structure, were as follows r468 
p. 156]: 

1. Each row of the factor structure should have at least one zero. 

2. Each column should have at least m zeros (m being the total number of 
common factors). 

3. For every pair of columns there should be at least m variables whose entries 
vanish in one column but not in the other. 

By way of definition, Thurstone states [477, p. 328]: “If a reference frame can be 
found such that each test vector is contained in one or more of the... coordinate 
hyperplanes, then the combined frame and configuration is called a simple structured 
This statement requires only the first condition, namely, that each row of the factor 
matrix should have at least one zero. The psychological basis for this definition might 
best be summarized in Thurstone’s words [477, p. 58]: “Just as we take it for granted 
that the individual differences in visual acuity are not involved in pitch discrimination, 
so we assume that in intellectual tasks some mental or cortical functions are not 
involved in every task. This is the principle of ‘simple structure’ or ‘simple configura¬ 
tion’ in the underlying order for any given set of attributes.” 

The other two conditions for simple structure were proposed as insurance that the 
reference hyperplanes be distinct and overdetermined by the data. Thurstone has 
since extended the original three conditions to five criteria for the determination of 
the reference vectors for a simple structure. His criteria [477, p. 335], in the language 
of the present text, are as follows: 

1. Each row of the factor matrix should have at least one zero. 

2. If there are m common factors, each column of the factor matrix should have 
at least m zeros. 

3. For every pair of columns of the factor matrix there should be several variables 
whose entries vanish in one column but not in the other. 

4. For every pair of columns of the factor matrix, a large proportion of the 
variables should have vanishing entries in both columns when there are four 
or more factors. 

5. For every pair of columns of the factor matrix there should be only a small 
number of variables with non-vanishing entries in both columns. 

In the foregoing it was tacitly assumed that the factors are uncorrelated, otherwise 
the term factor matrix would be completely ambiguous. The factor pattern and 
factor structure coincide in an orthogonal solution, so there can be no misunderstand¬ 
ing of the meaning of the “factor matrix” in that event. When the factors are oblique, 
however, the pattern is distinct from the structure and any reference to the “factor 
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matrix” is vague and can be misleading. This distinction will be developed further in 
chapter 13. The foregoing principles may lead to either an orthogonal or an oblique 
multiple-factor solution. The detailed distinctions are brought out in chapters 12-15. 

In general, the rotation of axes in order to arrive at simple structure may be viewed 
as an attempt to reduce the complexity of the variables. The ultimate objective would 
be a uni-factor solution, in which each variable would be of complexity one. As 
noted before, an orthogonal uni-factor solution is practically impossible with empir¬ 
ical data, and not very likely even when the factors are permitted to be oblique. 
Nonetheless, this is the ultimate objective, and it is toward that end that the simple 
structure principles are proposed for the multiple-factor solution. 

If the multiple-factor solution satisfies the five criteria listed above, then the 
graphical plot in the plane of each pair of factors will exhibit the following: (1) many 
points near the two final factor axes; (2) a large number of points near the origin; 
and (3) only a small number of points removed from the origin and between the 
two axes. When the 2-dimensional diagrams for all combinations of factors satisfy 
these three characteristics, Thurstone calls the structure “compelling,” and concludes 
[477, p. 335]: “In the last analysis it is the appearance of the diagrams that determines, 
more than any other criterion, which of the hyperplanes of the simple structure are 
convincing and whether the whole configuration is to be accepted as stable and ready 
for interpretation.” Such diagrams provide the basis for the graphical rotation 
methods of chapters 12 and 13. Various attempts to put the simple structure principles 
into objective terms are described in chapters 14 and 15, where analytical procedures 
are given for the multiple-factor solution. 

6.3. Solutions Requiring Estimates of Communalities 

While any matrix of correlations can be factor analyzed so that there is a reasonably 
close fit to the observed data, such solution may not yield the most desirable frame 
of reference. Sometimes the solutions answer the needs of the particular investiga¬ 
tion ; at other times, they serve as preliminary solutions to be transformed to more 
meaningful solutions of the type discussed in 6.5. The solutions enumerated in this 
and the following section can all be obtained by direct calculation from the correla¬ 
tion matrix. The distinction between the solutions in this section and those in 6.4 
corresponds to the a priori choice of communalities or the number of common 
factors—a requirement of the factor model (2.9) as noted in chapter 5. Three methods 
are considered in this section. Because of the general availability of electronic 
computers, the principal-factor solution is most popular today. The centroid solution 
hit its peak in the 1930’s and 40’s, and except for specialized situations is little used 
now. Triangular diagonalization is not at all a competing method for a preliminary 
solution, but it is of interest for its mathematical simplicity and is used in the course 
of obtaining another form of solution (see 6.4, par. 3). 

1. Principal-factor solution. —The criterion of ellipsoidal fit, as described in 6.2, 
paragraph 6, determines the reference system of the principal-factor solution. The 
initial proposal to use the principal axes of the higher-dimensional ellipsoids goes 
back to Karl Pearson [386] before the subject of factor analysis was even born. More 
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relevant to the present discussion is the work of Hotelling [259], who developed the 
procedures for component analysis of the correlation matrix based upon the model 
(2.8). The principal-factor solution follows essentially the same procedures, but 
operates on the reduced correlation matrix (i.e., with estimates of communalities in 
the diagonal) employing the model (2.9). An important distinction between principal 
components and principal factors is that the components are iinmediately expressible 
in terms of the observed variables while factor measurements can only be arrived at 
indirectly (see chapter 16). The general appearance and properties of a principal- 
factor solution will be indicated here, but its mathematical derivation and computing 
procedures are presented in chapter 8. 

When the factors are represented by the principal axes of the ellipsoids, each succes¬ 
sive one contributes a decreasing amount to the total communality. In other words, 
the first principal factor accounts for the maximum possible variance; the second 
factor accounts for a maximum in the residual space with the first factor removed; 
the third factor, a maximum in the residual space excluding the first two factors; and 
so on until the last common factor accounts for whatever communality remains. 
Assuming that m common factors account for the communality, a principal-factor 
pattern may be exhibited as follows: 


(6.3) 


'Zi = flufi + a 12 F 2 + a l3 F 3 + ••• + a lm F m , 

z 2 = a 2l F l + a 22p2 + a 22,F 3 + • • • + a 2m F m , 

< z 3 — a 3 ±F± + a 32 F 2 + a 33 F 3 + • • • + a 3m F m , 

. %n ‘ t^nlFl T ^n2-^2 T ^n3-^3 T ‘ ’ T ^nmF m •> 


from which the unique factors have been omitted. In such a pattern the complexity 
of each variable is equal to the total number of common factors. The first factor is an 
ordinary general factor whose coefficients a jl (j — 1,2, • • •, n) are all positive when 
the solution is based upon a table of positive correlations. On the other hand, ap¬ 
proximately half the coefficients of each of the remaining factors are negative, that is, 
F 2 , F 3 , -,F m are bipolar factors * A bipolar factor is not essentially different from 
any other but is merely one for which several of the variables have significant negative 
projections. Such variables may be regarded as measuring the negative aspect of 
the usual type of factor. Thus, if a number of variables identified with “fear” are 
represented by positive projections, variables with negative projections might be 
interpreted as measuring “courage.” It would appear simpler, however, to regard 
the factor merely as “fear,” and the opposing set of variables as measures of “negative 
fear.” Of course, the signs of all the coefficients of the factor may be changed without 
altering the adequacy of the solution. Such reversal in the foregoing example would 


* This term was introduced by Cyril Burt [55]. In the present text, however, the term is used 
in a more general sense. 
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lead to the interpretation of the factor as “courage,” and the subgroup of variables 
with negative coefficients would be regarded as measuring “negative courage.” In 
the illustrations of the text a single name for a bipolar factor is employed This is 
consistent with representing any factor by a single continuum. 

Obviously a pattern of type (6.3) can reproduce a general table of correlations like 
that exhibited in Figure 6.2. The solution is thus perfectly satisfactory from a statistical 
point of view. Although a principal-factor pattern may be more complex than other 
preferred types, it may sometimes furnish a more convenient representation of a 
particular set of variables. Since the principal-factor pattern is not restricted to 
positive coefficients, it can reproduce negative as well as positive correlations This 
type of solution may then be applied to any matrix of correlations. 

2. Centroid solution. This has been one of the most popular solutions in factor 
analysis, and was often misunderstood and misrepresented. The centroid method of 
analysis is intended to approximate the results that are obtained with the principal- 
factor method, but with considerable savings in labor. It also attempts to account 
for as much as possible of the total variance with each successive factor. However, the 
centroid solution is not unique for a given set of variables nor does it have the other 
interesting mathematical properties of the principal-factor solution. The appropriate 
perspective for the centroid method was set by Thurstone [477, p. 178]: “The centroid 
method of factoring and the centroid solution for the location of the reference axes 
are to be regarded as a computational compromise, in that they have been found to 
involve much less labor than the principal-axes solution.” Of course, the capability 
now exists (see 8.6) for obtaining a principal-factor solution in a matter of minutes 
When electronic computers are not available, the centroid method will be found very 
effective for the calculation of a preliminary solution. 

The general appearance of a centroid solution is indicated by (6.3), just as in the 
case of the principal-factor solution. For a matrix of positive correlations, the solution 
contains one general factor and the remaining are bipolar. A centroid solution can 
also be obtained for a matrix containing negative correlations, but the method be¬ 
comes somewhat more involved. A complete development of the centroid method is 
presented in 8.9. 

3. Triangular decomposition.— As noted in 3.4, the square root method may be 
used to reduce any symmetric matrix to a triangular matrix such that the product 
by its transpose is equal to the original matrix. This property corresponds to (2.50), 
the fundamental theorem of factor analysis. In other words, the triangular decomposi¬ 
tion of a matrix constitutes a factor solution. 

As a formal mathematical procedure for the solution of a set of simultaneous linear 
equations or the reduction of a matrix, the square root method must have been 
discovered over and over again, and may go back to the time of Gauss. Perhaps the 
earliest application to the solution of normal equations in least squares theory was 
made by Commandant A. L. Cholesky of the French Navy around 1915, and pub- 
hshed after his death by Commandant Benoit [35] in 1924. It was rediscovered by 
Banacffiewic 2 [24] in 1938 and presented as an efficient means for solving a system 
of linear equations and as a means for calculating determinants and their inverses. 
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The square root method was introduced in the American statistical literature in 1944 
by Dwyer [109] who emphasized its use in correlation and regression, and who 
showed the relationship of this method to other methods of linear computation [111]. 

Concurrent with this development of the square root method as a means of solving 
formal mathematical and statistical problems, essentially the same technique was 
being devised for factor analysis. The method was applied specifically to a correlation 
matrix by McMahon [360] prior to 1923. Then during the rapid development of 
factor analysis theory in the 1930’s, it was independently developed as the diagonal 
method by Thurstone [468, p. 78] and as the solid staircase method by Holzmger 
[234 No 5] A description of the square root method and its applications to factor 
analysis was presented by Harman [198] in 1954. An outline of computing procedures 

is given in 3 . 4 . ... . , , , 

The n variables are described in terms of n (or possibly fewer) new uncorrelated 

factors, in the following form: 


(6.4) 


'z x = fibi-fi, 
z 2 = a 21^1 + a 22 ^ 2 ^ 

< z 3 = a^^F^ + a 32 p 2 + ti 33 F 3 , 


l Z n — a nlFl + a n2p2 + a n3^3 + ' ” + a nnFn- 


It is evident that a great many variations of this particular form of solution are pos¬ 
sible, since any one of the variables may be selected to involve only one factor. For 
the full correlation matrix (with ones in the diagonal), the triangular decomposition 
(6.4) would involve all n factors. If the diagonal of the correlation matrix were re¬ 
placed by communalities, then the reduced correlation matrix would no longer be 
positive definite and the square root process would lead to imaginary numbers if 

attempts were made to carry it to n factors. 

In practice, interest would lie in only a few factors, especially if they were the most 
important in some sense. Such a procedure, with an accompanying computer pro¬ 
gram, has*been developed by Albert Madansky [353]. The variables are rearranged 
in such an order that a minimum amount is subtracted from each pivot element, 
and thus provide the largest possible number for which the square root is taken and 
which is the denominator for the calculation of all other elements, e.g., (3.25), (3.26). 
The result is a triangular decomposition S in which the successive diagonal elements 
are the maximum possible. The most important factors then come out in sequence. 
This follows from the following argument: First, the value of the determinant of R 
is obtained from (3.30), namely, 


(6.5) |R| = |S| • |S'|. 

Since the determinant of S is merely the product of its diagonal elements (and 
similarly for S'), it follows that the product of the squares of all the diagonal elements 
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of S is equal to the determinant of R. Now, if each of the factors had equal weight, 
each would contribute an amount equal to the n th root of |R|. Hence, the value of the 
n root may be used as a cut-off point for judging the importance of factors, cutting 
off at the number of factors for which the square of the diagonal exceeds the n th root. 
In the foregoing discussion, it was tacitly assumed that R was the full correlation 
matrix. However, even when communalities are put in the diagonal of the correla¬ 
tion matrix, the procedure for rearranging the order of the variables still assures the 
determination (of the reduced number) of factors in order of importance 

6.4. Solutions Requiring Estimate of Number of Common Factors 

As noted at the beginning of 6.3, all the solutions in that section and the present 
one are obtained by direct calculation from the correlation matrix. Such solutions 
may serve in their own rights, or they may be transformed to other desirable forms (see 
6.5). It will be recalled that the solutions enumerated in the last section required 
some estimates of .communalities, while a corresponding requirement for the solu¬ 
tions of the current section is an estimate of the number of common factors. The first 
two methods are applicable to any matrix of correlations and may be viewed as 
alternatives to the principal-factor procedure. The third type of solution is rather 
specialized but may have some useful applications. The remaining three solutions, 
while properly falling within the scope of this section, are presented primarily for 
historical reasons. 

1. Maximum-likelihood solution. —The reference system for the maximum- 
likelihood solution is not determined from the criteria in 6.2, except for the choice 
of the classical factor analysis model (2.9). Instead, this solution is based on funda¬ 
mental statistical considerations. It considers explicitly the differences between the 
correlations among the observed variables and the hypothetical values in the universe 
from which they were sampled. The first concerted efforts to provide a sound statistical 
basis for factor analysis were made by Lawley [320, 321] when he suggested the use 
of the “method of maximum likelihood,” due to Fisher [128,129], in order to estimate 
the universe values of the factor loadings from given empirical data. The maximum- 
likelihood method requires an hypothesis regarding the number of common factors 
and then, according to such hypothesis, there is derived a factor solution with 
accompanying communalities. Associated with the method is a test of significance to 
determine the adequacy of the hypothesis regarding the number of common factors. 

While the mathematical development goes back to the early 1940’s, the method 
did not become practical until quite recently. Even with the electronic computers of 
the 1950 s and early 1960 s, it took considerable time to solve the rather complex 
equations involved in the maximum-likelihood method. Furthermore, convergence 
of the process cannot always be assured, although there is strong indication that it 
will do so for most practical needs. In recent years, much work has gone into the 
development of more efficient algorithms, and together with advances in computer 
technology, there is good indication that the maximum-likelihood method may offer 
real competition to the principal-factor solution for the preliminary factorization of 
a correlation matrix. 
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A maximum-likelihood solution has the same general appearance (6.3) as a 
principal-factor solution; but it does not have the latter’s property of accounting for 
a maximum amount of variance for a specified number of factors. Also, while a 
principal-factor solution is unique for a given body of data, a maximum-likelihood 
solution only determines the common-factor space uniquely. In other words, one 
maximum-likelihood solution differs from another by a rotation. To remove this 
inherent indeterminacy, the computing algorithm must provide some side condition 
which fixes the particular solution. 

2. Minres solution. —A method of factor analysis which minimizes residuals 
(hence, the name “minres”) was developed recently [207]. Specifically, this method 
estimates factor loadings in such a way as to make the sum of squares of off-diagonal 
residuals of the correlation matrix a minimum. In so doing, it literally follows the 
second objective enumerated in 2.3, i.e., to “best” reproduce the observed correla¬ 
tions. This is in contrast to the principal-factor method which extracts the maximum 
variance. Again, however, a minres solution has the same general appearance (6.3) 
as a principal-factor solution. The minres solution is dependent upon an estimate of 
the number of common factors; the communalities, consistent with this hypothesis, 
are obtained as by-products of the method. These properties are common to the 
minres and maximum-likelihood solutions. 

When one considers that the primary objective of the minres solution is identical 
with the objective of the classical factor analysis approach in general, the question 
immediately arises why the method was not developed before. The simple answer is 
that it could not have been accomplished without the power of the modern-day 
computer. Now that the minres solution is a practical procedure it may be preferable 
to either the principal-factor or the maximum-likelihood solution. 

3. Multiple-group solution. —Unlike the two preceding methods which serve 
primarily as preliminary solutions for subsequent “rotation ’ to the multiple-factor 
solution, the multiple-group solution usually would be retained as the final form. 
What it has in common with all the other methods of this section is the requirement 
that the number of factors be estimated in advance; but it requires estimates of 
communalities as well. The basic concept in the multiple-group solution is that of 
grouping of variables. Either arbitrary or selected groups of variables provide the 
basis for the common factors (equal to the number of groups) to be extracted. 

The common factors extracted in a single operation can be expected to be oblique 
to one another. Then the complete analysis should include the factor pattern, the 
factor structure, and the matrix of correlations among the factors. If it is desired to 
rotate the results to some other solution, the problem is simplified if an orthogonal 
frame of reference is obtained first. The orthogonal factor pattern and the transforma¬ 
tion matrix from the oblique to the orthogonal solution, as well as the oblique factor 
pattern can all be obtained very efficiently by applying the square root method to the 
matrix of factor correlations and the oblique factor structure. 

4. Simple factor models. —Another class of solutions is dependent on a prior 
decision regarding the number of factors and therefore deserves mention in this 
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section. They are not alternatives for the preceding methods, but are of interest 
primarily from an historical viewpoint. Their common attribute is the quest for 
simplicity—in a sense they represent the ultimate in parsimonious description 
but unfortunately their attainment can only be expected in very specialized circum- 

stances 

The first, a uni-factor solution, certainly must be considered an ideal form hardly 
to be expected to fit any empirical data. Such a solution satisfies the standard^of 
linear fit, and hence minimum complexity. For the vectors representing the variables 
to lie exactly on the reference axes, so that each variable would measure but a single 
factor, implies a matrix of correlations with properties as shown in Figure 6.1. While 
it is highly unlikely that empirical data would yield an orthogonal um-factor solution, 

such an oblique solution might be approximated. 

In the early days of factor analysis, very simple factor models were proposed m 
defense of certain theories in psychology. Thus, when Charles Spearman [438,440] 
developed the psychological theory that all intellectual activity consisted of a 
“general” function and a “specific” function for each element, it conformed to the 
uni-factor model, albeit in a very limited sense. For in this case all the variables of a 
set are describable in terms of a single common factor, constituting the simplest 
instance of a uni-factor solution which could arise from a group of variables with 
correlations depicted by a single triangle in Figure 6.1 and which satisfy the conditions 
(5.10) for one general factor. Spearman’s “Two-factor Theory” was named from the 
two kinds of factors—general and specific. In recent years, it has been the practice to 
use the adjectival descriptors of different factor solutions to denote the number (and 
sometimes the type) of common factors only, it being understood that the factor 
model (2.9) of common and unique factors apply to all. In this sense, Spearman s 
two-factor solution is a uni-factor or “single-factor” solution, but the old name is 
retained for historical reasons. 

An attempt to meet the inadequacies of the preceding two-factor theory was made 
by Holzinger [234] in his proposal of the “Bi-factor Theory”. The inadequacies 
became apparent in the 1930’s when the complex psychological test batteries came 
into being. In developing a broader theory to cope with the greater demand, Holzinger 
nevertheless was guided by Spearman’s earlier work. While he claimed no more for 
his bi-factor theory than an extension of the two-factor theory, it is in fact an alter¬ 
native multiple-factor theory as well. . 

The essence of the bi-factor solution is that it includes a general factor m addition 
to uncorrelated group factors and unique factors. In this way the limitation of the 
two-factor solution is overcome, i.e., while an orthogonal uni-factor solution can 
only reproduce a matrix of correlations of the form in Figure 6.1, a bi-factor solution 
can fit a more general set of correlations as indicated in Figure 6.2. This modification 
of the uni-factor pattern is tantamount to substituting planar for linear geometric 
fit When the vectors of a group of variables lie in a reference plane, each variable 
measures just two factors. If, furthermore, the vectors lie only in the reference planes 
formed by a general-factor (F 0 ) axis and one group-factor axis (none in the planes of 
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two group factors), the configuration can be described as a pencil of planes through 
tiic Fq axis. 

6.5. Multiple-Factor Solutions 

In contrast to the solutions of the last two sections, which are calculated directly 
rom a correlation matrix, the solutions considered in the present section can only be 
derived from some preliminary solution. Multiple-factor solution is a generic term 
originated by Thurstone in the 1930’s, and is used primarily to distinguish a general 
ype of solution involving m common factors without a prior hypothesis regarding 
me grouping of variables to constitute group factors. 

The essential features of a multiple-factor solution include “overlapping” group 
factors and avoidance of a general factor. By “overlapping” is meant that the several 
group factors do not involve distinct subsets of variables, i.e., several group factors 
appear m the description of a variable. Overlapping thus implies complexity of two 
r more for the variables in a multiple-factor pattern, and hence the prefix “multiple” 
is employed. Through the overlap of group factors, a table of correlations as in 
Figure 6.2 can be fit by such a solution. 

A multiple-factor solution satisfies the criterion of hyperplanar fit.f In analytical 
terms this means that the complexities of the variables should not exceed m - 1 
where m is the number of common factors. While this is the maximum complexity it 
is desirable to obtain a solution in which the description of each variable can be made 
as simple as possible. Also, in spite of the necessary overlapping of factors, it is 
desirable to have as many zero coefficients in the columns as possible. More specific 
criteria for a multiple-factor solution are the “simple structure principles” of 6 2 
paragraph 10. v ’ 

Any matrix of correlations which can be factored into some preliminary solution 
could presumably be transformed into a multiple-factor solution, either orthogonal 
or oblique. Such a solution usually will have many zero coefficients. However the 
particular configuration of zeros and overlap of group factors is not required for a 
multiple-factor solution, in general. As noted before, graphical methods (as indicated 
m chaps 12 and 13) have been used in the past and more objective methods (see 
chaps. 14 and 15) are now available for the determination of multiple-factor solutions 
satisfying the simple structure principles. 

Before leaving the enumeration of the different types of factor solutions, one more 
should be mentioned. Several statistical techniques are available for fitting a prescribed 
factor pattern [see 286, 287, 416]. The object of these procedures is to obtain a 
multiple-factor type of solution by postulating in a preliminary solution where the 

• * ^ ordinary space geometry a pencil of planes refers to the totality of planes through a line 
. ..all the planes linearly dependent on two distinct planes through the line. In the present 
setting however a pencil of planes in a space of (m + 1) dimensions 8 will refer to all the P planes 

“ or,hogonaL 11 is c,ear,y seen ,hat ,here are m sach «*£« » 

tWi th special reference to psychology, Thurstone has called the factors determined bv hvner- 
planar fit primary factors, and the corresponding psychological attributes, “primary abilities.” 
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positions of the zeros and large loadings should be in the desired factor pattern, 
rather than the transformation of the preliminary solution according to certain 
criteria of simple structure. It should be noted that a bi-factor solution also satisfies a 
prescribed factor pattern, but its special simplicity and mode of computation sets it 
apart from the more general treatment. However, a bi-factor solution can be deter¬ 
mined as a special case of the general methods for fitting a prescribed factor pattern. 

6.6. Summary of Factor Solutions 

More than ten distinct types of factor solutions have been described in this chapter 
but they are not equally “preferred” by workers in the field. Some are presented 
primarily because of their historical significance, some because they meet particular 
needs, and some because of their general applicability. The first requirement of a 
factor analysis is for the solution to adequately explain the interrelationships among 
many variables. Thus, there should be good statistical fit of the reproduced correla¬ 
tions to the observed correlations. Secondly, after statistical fit is satisfied, it may be 
desirable to simplify the factorial results and make them more meaningful in a 
particular subject-matter field. 

With the general availability of computers, a typical approach of many investigators 
is to obtain a principal-factor solution for an observed correlation matrix, and then 
transforming it to the varimax multiple-factor solution (see chap. 14). One should 
not overlook other possibilities, however. The increased capabilities of computers 
make the maximum-likelihood solution more practical than in the past, and it may be 
preferred to the principal-factor method. Similarly, the new minres solution may turn 
out to yield the most desirable initial factorization of a correlation matrix. The 
multiple-group method may provide a useful final solution or it may also serve as a 
preliminary factorization for subsequent transformation to the multiple-factor form. 
For the (transformed) multiple-factor solution there is the choice between orthogonal 
and oblique factors, and then several alternative methods (see chaps. 14 and 15). 

A summary of the different solutions is presented in Table 6.1; the assumptions 
properties, and characteristics of each type are outlined briefly, and the important 
distinctions are brought out. It may also serve as a convenient reference source after 
the detailed development and computing procedures have been studied in the 
ensuing chapters. 

The choice of one form of solution rather than another may take into considera¬ 
tion particular hypotheses or theories of a content area, as well as the standards 
presented in this chapter. Thus, if in the field of biology, a general growth factor were 
postulated then a factor analysis of a set of body measurements in the bi-factor form 
would be appropriate. On the other hand, if a general factor were denied by a theory 
m a certain field, then the multiple-factor form would be appropriate. Examples of 
well known psychological theories of human abilities are Spearman’s Two-factor 
Theory, Holzinger’s Bi-factor Theory, and Thurstone’s Primary Factor Theory. The 
factor analysis techniques which were developed in support of such theories turned 
out to have greater applicability than the specific purposes for which they were 
originally conceived. 
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It should be remembered that when a particular theory is found to be compatible 
with experimental evidence, it does not follow that nature does in fact behave precisely 
as stated. No doubt, an alternative theory may be postulated which is also compatible 
with the data. The mathematical expressions (e.g., the factor patterns) of these theories 
may be different because of adherence to different criteria. Nonetheless, if both are 
consistent with the empirical evidence they are equally adequate on statistical 
grounds, and any preference must be made on other considerations. Certainly, it is 
wrong to argue that a particular factor solution is incorrect because it appears dif¬ 
ferent than another. Factor analysis, as a statistical technique, yields solutions which 
are convertible from one to another, and a preference of form must depend upon 
appropriate content criteria. As in all empirical sciences, several equally satisfactory 
laws may be usefully employed, although they may be quite different formally. 
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7 

Simple Factor Models 


7.1. Introduction 

In part i of this text, the foundation and formal groundwork was laid for factor 
analysis. The direct methods of analysis (in contradistinction to transformation 
methods) are developed in chapters 7 through 11. Certain techniques which were 
very useful in the early development of factor analysis are presented in this chapter, 
followed by more modern and powerful techniques for analyzing any correlation 
matrix in subsequent chapters. Because of its well-deserved pride of place in factor 
analysis, Spearman’s two-factor solution is treated first. 

From the present day perspective it seems only natural that the first approach to 
factor analysis should have involved the simplest possible model. That is in fact 
what happened, in 1904, when Spearman formulated a single common factor theory 
of intellective ability. According to his famous theorem (see sec. 5.3), when the tetrads 
vanish every variable can be described in terms of a general factor g and a specific 
(now called, unique) factor s. The notion of a general factor was carried beyond its 
actual value as a single common factor which accounted for all the intercorrelations 
of a particular matrix to a psychological theory that g entered into all abilities. 
Actually, Spearman had cautioned [440, p. 76] that such a universal law “... must 
acquire a much vaster corroborative basis before we can accept it even as a general 
principle and apart from its inevitable corrections and limitations.” 

The two-factor solution is developed in 7.2, and followed in 7.3 by an interesting 
situation where all the conditions for a single common factor are met but the result¬ 
ing solution is inappropriate nonetheless. In section 7.4 an index is developed for 
judging the degree of cohesiveness of variables. 

As is all too evident, the two-factor solution is very limited in its application since 
it implies a correlation matrix of unit rank. Aside from the controversial psychological 
theory involving a single general factor “g”, the method of analysis was inadequate 
for the correlation matrices resulting from the complex psychological test batteries. 
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7.2 DIRECT SOLUTIONS 


Holzinger, who had been one of the chief proponents of Spearman’s “two-factor 
theory”, developed the “bi-factor theory” to cover the more complex situations. His 
bi-factor “theory” of psychological behavior provides for the splitting of Spearman’s 
single general factor into a general and several group factors, so that all variables 
can be described in terms of a general factor, a group factor, and a unique factor. 
The analysis of a correlation matrix leading to a bi-factor solution is presented in 
7.5, and illustrated with a 24-variable example in 7.6. 

7.2. Txvo-Factor Solution 

According to Spearman’s fundamental theorem, the necessary and sufficient 
conditions for a set of n variables to be describable in terms of just one general factor 
and n unique factors are the vanishing of all tetrads, namely: 

( 7 -1) r jk r im ~ r ik r jm ~ 0 (./, k, l, m = 1, 2, • • •, n ; j ^ k ^ l =£ m). 

The number of such conditions increases very rapidly with the size of n, as indicated 
by (5.12). Actually, not all of these conditions are independent, and as shown in 5.3, 
there are n(n — 3)/2 equations of condition for one general factor among n variables. 
In the early days of factor analysis sound statistical tests were made before the factor 
model was accepted. In other words, all tetrad differences were computed from a 
table of correlations and the median of such a frequency distribution would be 
compared with its probable error. Formulas for the sampling errors of tetrad 
differences were developed by Spearman and Holzinger [444]. 

The two-factor pattern may be written as follows: 

( 7 - 2 ) zj = a j0 F 0 + djUj (./ = 1, 2, • • •, w), 

where F 0 is the general factor and the Uj are the n unique factors, and again, as in 
the case of (2.9), the prime on Zj has been dropped for simplicity. When the correlations 
r jk satisfy the tetrad conditions, or the equivalent conditions (5.10), the pattern (7.2) 
may be assumed and the coefficients a j0 and dj have to be determined. Formulas 
will be developed for the computation of the a j0 under the assumption that the 
residuals vanish, i.e., 


The correlations reproduced from the pattern (7.2) are given by 

( 7 - 4 ) r 'jk = a j0 a k0 , 

and, under the assumption that the residuals vanish, the observed correlations may 
be written 

( 7 - 5 ) r jk = r' jk = a j0 a k0 . 

In the remainder of this section the reproduced correlations will be replaced 
by observed correlations. Upon multiplying equation (7.5) by the square of the 
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general-factor coefficient for any variable z c , this equation becomes 

( 7 *^) a eO r jk ^eO^jO^kO ~ ( a eO a jo)( a eO a ko) = r ej r ek s 

and, summing over the correlations, there results 

n n 

( 7 * 7 ) a e0 E r jk= E r ej r ek (e is fixed, j, k ^ e). 

j<k =1 j<k = 1 

n 

The symbol £ r jk stands for the sum of all the correlations r jk , where j and k 

j<k= 1 

each range over the variables 1,2, • • •, n but subject to the restriction that j is always 
less than k. In a symmetric matrix of correlations, this sum is merely the total of all 
the entries above (or below) the principal diagonal. 

Since there is only one common factor, the coefficient of this factor for any variable 
is merely the square root of the communality of the variable. Hence formula (7.7) 
may be written explicitly for the square of the coefficient, or the communality, of 
any variable z c , as follows: 

d q\ „2 _ ;„2 _ E (fgjl'ekiji k 1? 2, ,Tl, j , k 7^ 6 ,j <C /c) 

a e0 — n e — — -——-——;-:- r~- 

E ( r jk\h k = 1,2 ,•••,«, j,k =£ e,j < k) 

It will be observed that the diagonal elements of the correlation matrix do not enter 
into formula (7.8). In fact, the formula yields values of the communalities, which 
theoretically are the diagonal elements preserving the unit rank of the matrix. When 
the conditions (5.10) are satisfied by the observed correlations a single factor is 
postulated. The computed diagonal elements must also satisfy these conditions in 
order that the rank of the reduced correlation matrix be unity. 

Formula (7.8) is actually much simpler than it appears. The denominator contains 
the sum of all correlations excluding the given variable e, while the numerator con¬ 
tains the sum of paired products of correlations of the given variable with each of the 
other variables. For example, the square of the general factor coefficient for the first 
variable in a set of five is given by: 


-.2 _ r i2^i3 + 7*127*14 + 7*i 2 r 15 + r 13 r 14 + r 13 r 15 + r l4 r l5 

l 10 ~ ——- — ., 

7*23 + 7*24 + 7*25 + r 34 + r 33 + = 


While formula (7.8) is simple enough, it can be expressed in a more convenient 
form for computational purposes, namely: 


n \ 2 n 

E 7*J - E r ej 

J=1 / J=1 


2 X 


•it - L 


j<k =1 j= 1 


(e is fixed, / ^ e). 


The adaptability of this formula to machine calculation will be clear from the follow¬ 
ing restatements of the terms in the formula. If R 0 is the matrix of correlations with 
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the elements in the principal diagonal omitted, then 

n 

£ r ej is the sum of the correlations in column e of R 0 , 

j=i 

n 

£ rlj is the sum of squares of the correlations in column e of R 0 , 

j=i 

n 

£ r jk is the sum of all the correlations below the diagonal of R 0 . 

j<k= 1 

After the communalities have been obtained, the unique variances can be deter¬ 
mined. As pointed out in 2 . 4 , the uniqueness for variable e is given by 

dl = 1 — hi. 

If, in addition, the reliability coefficients of the variables are known, the uniqueness 
may be split into error variance and specificity by means of the last two of formulas 
( 2 . 20 ). 

An illustrative example will clarify the method of analysis described in this section. 
The correlations among five variables is presented in Table 7.1. Spearman [440] used 


Table 7.1 

Correlations among Five Tests, and Calculation of General Factor Coefficients 


Test 

1 

2 

3 

4 

5 

1. Mathematical judgment 






2. Controlled association 

.485 





3. Literary interpretation 

.400 

.397 




4. Selective judgment 

.397 

.397 

.335 



5. Spelling 

.295 

.247 

.275 

.195 


Sum: £ r ej 

1.577 

1.526 

1.407 

1.324 

1.012 

Sum of squares: £ rT 

.6399 

.6115 

.5055 

.4655 

.2617 

Denominator of (7.10) 

3.692 

3.794 

4.032 

4.198 

4.822 

a 2 e0 

.5003 

.4526 

.3656 

.3067 

.1581 

a e o 

.707 

.673 

.604 

.554 

.398 


this example to demonstrate the applicability of his “two-factor theory”. He com¬ 
puted all the tetrads and found their median to be .013. The probable error of a 
tetrad difference for a sample of the given size was found to be .011; and from this 
agreement, Spearman concluded that the observed data were adequately fit by the 
simple theory. Using the computing formula (7.10), the necessary sums for each 
variable are recorded in the lower part of Table 7.1, with the general-factor coefficients 
in the last line of the table. 
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7.3. The Heywood Case 

In the preceding section, the theory and computational methods were described 
for the solution of a set of variables in terms of a single common factor. As indicated 
above, the fundamental conditions for such a solution are the vanishing of all tetrads, 
i.e., all second-order minors (not involving a diagonal element) of the correlation 
matrix must be zero. It is possible for such conditions to be met (ideal rank, in the 
sense of 5.4, being unity), while one of the diagonal elements must be greater than 
unity in order for the reduced correlation matrix to be of rank one. This apparently 
startling situation is referred to as the “Heywood” case [229]. 

The classical example to illustrate the Heywood case consists of the correlations 
in Table 7.2. It can be verified readily that all the tetrads vanish, but upon applying 


Table 7.2 


Hypothetical Example Illustrating the Heywood Case 



formula (7.10) the solution in the middle portion of Table 7.2 results. This solution 
reproduces the correlations perfectly, but it lacks one basic requirement to be a 
legitimate solution—the communalities must be positive numbers between 0 and 1. 
Because one of the communalities is greater than one, this solution is a Heywood 
case. 

While the ideal rank of the correlation matrix of Table 7.2 is one, the actual rank 
of the matrix with acceptable numbers in the diagonal certainly exceeds one. If more 
than one common factor is permitted, an infinitude of solutions is possible in which 
no communality is greater than one. An example of a solution in terms of five common 
factors is given in the right-hand portion of Table 7.2. In this case each communality 
is just under one. 

In case the ideal rank is greater than one, but lower than the rank of the reduced 
correlation matrix unless one of the diagonal values exceeds unity, the resulting solu¬ 
tion is referred to as the generalized Heywood case. In all such instances, when a 
communality greater than one appears, there is pretty clear evidence that the rank 
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of the correlation matrix must be higher than that which produced the Heywood 
case. Guttman [177] proves that there are many cases where the rank of all diagonal- 
free submatrices of the correlation matrix is small, but the minimum rank which 
preserves the Gramian properties is nevertheless very large compared with the 
number of variables. He concludes that merely studying the minors outside the main 
diagonal is not sufficient for determining communalities or the minimum possible 
rank. 


7.4. Grouping of Variables 

In the bi-factor method, and some of the subsequent methods of analysis (e.g., 
chap. 11), a prime consideration is the grouping of variables into subsets. Often, 
the study will be so designed that three, four, or more variables are of a kind which 
might measure the same factor. Such a hypothetical design of the variables is tested 
by the factor analysis, which provides the evidence for retaining or rejecting the 
original grouping of variables. 

When factor analysis is employed as a tool in developing theories of behavior in a 
particular content field, then hypothetical grouping of variables is usually based 
upon previous research in which some of the factors have already been identified. 
The design cgp then be extended to include other groups of variables used to identify 
additional factors. In some cases it may be desirable to take a portion of a previous 
design and add new variables to obtain more refined measures of the factors already 
identified. The success of such a factor analysis depends in large measure on the skill 
with which the variables have been selected for the groups. 

On the other hand, when factor analysis is used simply as a statistical tool in the 
simplification and interpretation of a correlation matrix, and grouping of variables 
is required, then some objective means of getting such groupings from the matrix 
itself would seem to be indicated. Such a procedure is readily available under the 
assumption that the variables of a group identifying a factor have higher intercorrela- 
tions than with the other variables of the total set. Such an index is designated as the 
“5-coefficient” or “coefficient of belonging” [243, p. 24], and is defined as 100 times 
the ratio of the average of the intercorrelations among the variables of a group to their 
average correlation with all remaining variables. 

To distinguish between variables of different groups, the standard set-theory 
notations can be adapted to the present needs. The primary definitions follow: 


(7.11) 


fa) ee G p means e is a variable in the group G p . 

(b) {Zji j'g G p , p = 1,2, • • •, m) denotes the system of elements Zj for all 
values of j in groups G p , where the range of p is indicated. The elements 
of the system are first designated, followed by a semicolon, and all the 
properties on the elements are to the right of the semicolon. 

(c) £ (z.ji ; i = 1,2 means the sum of all the elements in the system 

(z H ; i = 1,2, • • •, N). The index j is fixed, the summation extending 

J jy 

over i. This sum is equivalent to the more conventional form £ z Jt . 

< i — 1 
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It will be found that these definitions aid in clarity and ultimate simplicity in describing 
much of the following theory. 

Employing this notation, the ^-coefficient may be expressed as follows; 

< 7 - 12 ) . B(j) = 100(S/« s )/(T/« r ), O'6 G„ V = 1, 2, • ■ ■, m) 

where the variables are said to comprise a group G„. the sum of intercorrelations 
among the variables of the group is given by 

( 7 - 13 ) $ = E ( r jk ; h keG p , j < k) 

and the sum of the variables in the group with all remaining variables is given by 
( 7 -14) T= 'L( r jk> jeGp, k not in G p ), 


while n s and n T are the numbers of correlations in the sums S and T, respectively. 

The ^-coefficients are used to sort the variables on the basis of their intercorrela¬ 
tions. The grouping is begun by selecting the two variables which have the highest 
correlation. To these is added the variable for which the sum of the correlations with 
the preceding is highest. This process is continued, always adding a variable which 
correlates highest with those already in the argument of B, until a sharp drop appears 
in the value of B. When this occurs, the last variable added is withdrawn from the 
group. Another variable may be inserted in its place, but, if the drop in B is still large, 
it should be withdrawn. Thus a group of variables that belong together is determined. 
Then, excluding the variables that have already been assigned to such a group, the 
two others which have the highest remaining correlation are selected to start another 


group. To these variables are added others, exclusive of those that have already 
been assigned to groups, until a significant drop appears in B, at which stage another 
group is formed. It is desirable to start each new group with a ^-coefficient as large 
as possible so as to have clearly defined groups. For this purpose it may be necessary 
to obtain the ^-coefficients for more than one pair of variables without completing 
the groups. The pair yielding the highest B value may then be used to introduce the 


next group. This process is continued until all variables have been 
or else do not fit into any group. 


assigned to groups 


If, in the course of determining a group of variables that belong together, the 
numbers of variables entering into the argument of B is designated by v then 


(7.15) 


% = I 2 J = v(v - l)/2 

n T = v(n — v) 


and the basic definition (7.12) of B may be put in the form : 


(7.16) 


B(j) = 200 (n - v)S/(v - 1)72 


While this formula is used in calculating B-coefficients, several auxiliary formulas 
facilitate the computations. First, another expression for T in place of (7.14) will be 
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found convenient, namely: 

(7.17) r=2> Je ; jeG p , e= 1,2, •••,«, j ± e) - 2S. 

Since the sums of the correlations of each variable with all others are usually obtained 
at the start of any analysis, these sums for the variables in the particular group less 
twice their intercorrelations yield the sum T. 

Another aid in the computation of the ^-coefficients is the sum of the correlations 
of the last variable added to the group with the preceding ones. Letting / denote the 
last variable added to the group, the proposed sum may be written 

(7.18) ^ = J^Gp, j ¥=■ /)• 

If a subscript v is appended to the sums L, S, and T to designate their values for v 
variables in the argument of B, then successive values of S and T may be obtained 
by means of the recurrence formulas: 

(7.19) S v = S v _ i + L v 
and 

(7.20) T v = T v _ x + £(r eJ ; *=1,2,•••,«, e ^ l) - 2L V . 

Actual computations of 5-coefficients are shown later in this chapter, but a word 
of interpretation is in order here. A value of B = 100 means that the average of the 
intercorrelations of the selected subset is exactly the same as the average correlation 
of these variables with all the remaining ones. Such variables would not be regarded 
as “belonging together” any more than they belong with the other variables of the 
total set. As an arbitrary standard for “belonging together”, a group of variables 
may be required to have a minimum 5-coefficient of 130. Actually, there is no sampling 
error formula for judging the significance of the difference between two successive 
values of 5. Knowledge of the nature of the variables may be of some assistance in 
deciding whether a drop in 5 is “significantly” large. 

Since the 5-coefficient is the ratio of two averages, its properties may be studied 
by means of them. The average of the intercorrelations of the variables in the group 
(the numerator of 5) tends to decrease as the number of variables in 5 increases, 
since the variables are added on the basis of highest correlation with those already 
in the argument of 5. Similarly, the average of the correlations of the variables in the 
group with all remaining variables (the denominator of 5) tends to decrease with 
an increase in v. The decrease in the numerator is relatively greater, however, than 
that in the denominator. 

To the numerator, which usually consists of a small number of correlations, are 
added a few additional correlations which are lower in value than the others and thus 
steadily decrease its value. On the other hand, from the large number of correlations 
in the denominator a small number of the larger values is taken away. The value of 
the denominator is decreased, but not so noticeably as that of the numerator. Thus 
the 5-coefficient decreases, in general, as more variables are added to its argument. 


120 



SIMPLE FACTOR MODELS 7.5 


An exception to this may occur with the addition of a variable to the subset which 
has relatively high correlations with the preceding variables but a low sum of correla¬ 
tions with the remaining variables. In this case the decrease in the numerator is 
relatively smaller than that in the denominator, and B increases. Similar reasoning 
accounts for the fact that a variable can be rejected from a group because of a large 
drop in the value of B and then appear in the group later, after several others have 
been added to the subset. 

As the number of variables in the argument of B increases, the decreases in the 
above averages become less and these averages tend toward stability. A consequence 
of this is that an actual difference between two successive B values has a greater 
relative significance as the number of variables v increases. 


7.5. Bi-Factor Solution 

A bi-factor solution for n variables, in its simplest form, consists of n linear 
equations involving one general factor and m group factors (ignoring the unique 
factors). If the variables are rearranged according to the groups in which they fall, 
every variable will be expressed in terms of the general factor F 0 , and, in addition, 
the variables in the first group will involve F t ; those in the second group, F 2 ; and 
so on to the variables in the last group which will involve F m . In such a simple model, 
each of the group factors overlaps with the general factor but not with any of the 
other group factors. From such a model, the reproduced correlation between any 
two variables j e G p and k e G q is given by: 


(7.21) 


= Y. ([“i 0 F 0 i + a JP F rl ][a ko F ol + a lk FJ_ ; i = 1, 2, • ■ •, N)/N, 

= d* a jO a kq r FoFg “f* ^jp^kO^FpFo F ttjp®kq r F p F q ’ 

< ~ a jO a kO + a jp a kq r F p F q > 


where the last equality follows from the fact that the general factor is uncorrelated 
with the group factors. If the two variables are in different groups, then 

(7.22) r'j k = a j0 a k0 (jeG p , keG q , p * q), 

but, if they are in the same group, their reproduced correlation becomes 

(7.23) Fj k cijo&ko "f~ ^jp^kp (.A ^ ^ G p ). 

The bi-factor solution is obtained by first computing the general factor coefficients 
and then the coefficients of each of the group factors. Sometimes this is followed by 
some adjustments to the simple model, as may be suggested by a study of the 
residuals. 

1. General-factor coefficients. —By appropriate choice of variables, a subset can be 
found such that the conditions for a single common factor are satisfied; and hence 
the procedure of 7.2 may be applied to get the general-factor coefficients. As a first 
step the variables are placed into groups, toward which end ^-coefficients may be 
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useful. Then, it will be noted that a subset of variables consisting of one each from 
different groups satisfies the form of the two-factor pattern—there is only a general 
factor among the variables of such a subset, and a unique factor for each variable. 

To indicate how a subset of variables is selected which involves only one common 
factor, consider three distinct groups G p ,G q , and G r , and take one variable from each 
of them. Such a triplet may be designated (e, j, k), where e<=G r , jeG p , and keG q 
and r =£ p ^ q. Any one of the many triplets, as the indices take specific values in 
the different groups, has the property of involving only one common factor. From 
this property, it follows that the reproduced correlation between any two of these 
variables is given by (7.22). Next, replace these correlations obtained from the factor 
pattern by observed correlations* and multiply this equation by a 2 0 to get: 

(7.24) a e0 Vjk = (a e o a jo)( a eo a ko) = r efek ( e ^G r , r p q). 

From this expression a value of the general-factor coefficient for variable z c can be 
computed based entirely upon the correlations involving only the particular variables 
Zj and z k . To obtain a more reliable evaluation of any general-factor coefficient, 
sum both sides of (7.24) for all valuesf of j and k which, together with e preserve the 
property of involving only one common factor. Thus, for any eeG r the square of its 
general-factor coefficient is given by: 

(7 25) a 2 _ ^ i^ej^ek ? 7 G G p , k g G q , p < q — 1, 2, • • •, m, p, q 7 ^ r) 

C ° 7 gG p , keG q , p < q = 1,2, ,m, p,q=£r) 

The formula is comparable to (7.8) for the case where all the variables involve only 
one common factor. 

An important extension of formula (7.25) should be noted. If the pattern plan of a 
set of variables is of the bi-factor form, but includes a number of variables which 
measure only the general and no group factors, additional terms can be included 
in this formula. For any two such variables, together with any other variable e, will 
involve only one common factor. The summations in (7.25) should extend to all 
such variables j, k; the only restriction being j < k so that no correlation should be 
used more than once. 

2. Group-factor coefficients.—Before the group-factor coefficients can be computed 
it is necessary to obtain the residual correlations with the general factor removed. 
Such general-factor residuals are defined by 

( 7 - 26 ) rjk = r Jk ~ a j0 a k o (j, k = 1, 2, • • •, n). 

These residual correlations tend to be of the form shown in Figure 6.1, with the values 
in the rectangles being approximately zero. The standard error of a residual correla¬ 
tion is indicated in Table A of the Appendix. 

* The tacit assumption is that the residuals vanish. 

f If j and k are merely restricted to be in different groups, and the variables range over all groups, 
each correlation would appear twice, since r jk = r kj . To avoid this, the indices j and k are per¬ 
mitted to range over all groups, namely, je G p and k e G q for p, q = 1,2, • • •, m, but under the 
condition p < q. 
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In the residual-factor space, the n variables can be described by a uni-factor 
pattern, i.e., the bi-factor solution may be considered as a uni-factor one with a 
general factor superimposed. The residual correlations (7.26) for each group of 
variables, taken alone, should have a matrix of rank one and hence measure only 
one common factor. While the method of 7.2 can be used for calculating the group- 
factor coefficients, there will usually be a relatively small number of variables in 
each group, and it seems more advisable to use the method of triads of 5.3. By this 
procedure the group-factor coefficient for any variable e e G p is given by 

(7.27) a 2 =~ j,keG j < k, j,k^e\ (p = 1, 2, • • •, m), 

■ v p \ r Jk J 

where 

(7.28) 

and n p is the number of variables in the group G p . 

The complete determination of a bi-factor pattern is possible by means of formulas 
(7.25) and (7.27). After all the coefficients have been computed, the final residuals 
can be obtained. These are the residuals with all factors removed, namely: 

r jk = r jk -r' jk , 

(7.29) Tj k QjQCi k Q cij p a kp , 

= fjk ~ a jp a kp- 

If the variables j and k do not belong to the same group, then r jk = r jk , i.e., the 
general-factor residuals are the final residuals and must not be significantly different 
from zero. On the other hand, the general-factor residuals for variables within a 
group must be significant to warrant the postulation of an additional factor among 
them. Approximate sampling errors of residuals are given in the Appendix. 

3. Adjustments.—When it is found that certain residuals between variables of 
different groups are significant, it may be necessary to modify the simple bi-factor 
plan. The same is true if certain general-factor residuals within a group are practically 
zero. Since the bi-factor solution involves the formulation of a mathematical model 
(the pattern plan) and the calculation of numerical values for this model, it should 
be possible to establish some measure of “goodness of fit.” Rough approximations 
to the sampling errors for factor coefficients and residuals (indicated in the Appendix) 
may serve as guides in verifying the appropriateness of a particular factor solution 
to the given data. Moreover, procedures are now available for fitting a prescribed 
factor pattern by statistical means [see 286,287, 416]. The prescribed model could, 
of course, be a bi-factor plan. Before such procedures were available, the practise 
was to first obtain a simple or pure bi-factor pattern, and then in the course of the 
analysis certain modifications would be made. A new or revised pattern plan might 
be suggested from the B-coefficients for different subgroups of variables, from the 
crude tests of significance of residuals, or simply from inspection of the general- 
factor residuals. 
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7.6. Illustrative Example 

In order to illustrate the bi-factor solution, and also for subsequent factorial 
methods, an empirical set of data will be employed The particular example consists 
of twenty-four psychological tests given to 145 seventh and eighth grade school 
children in a suburb of Chicago, and is largely an outgrowth of the Spearman- 
Holzinger Unitary Trait Study [234]. The initial data were gathered by Holzinger 
and Swineford [245], while applications of these data by Holzinger and Harman [243], 
Kaiser [293], Neuhaus and Wrigley [379] and others have made this example a classic 
in factor analysis literature. 

The list of the twenty-four variables, their means, standard deviations, and reli¬ 
ability coefficients are given in Table 7.3. Their correlations are presented in Table 7.4. 
In the last row of the latter table is given the sum of the correlations for each variable 
with all the others. In obtaining such a sum for any variable, all 23 entries in its row 
and column must be added since only half of the symmetric matrix is recorded in 
the table. 


Table 7.3 

Basic Statistics for Twenty-Four Psychological Tests 


Test 

x, 

Mean 

Standard 

Deviation 

Reliability 

Coefficient 

1. Visual Perception 

29.60 

6.90 

.756 

2. Cubes 

24.84 

4.50 

.568 

3. Paper Form Board 

15.65 

3.07 

.544 

4. Flags 

36.31 

8.38 

.922 

5. General Information 

44.92 

11.75 

.808 

6. Paragraph Comprehension 

9.95 

3.36 

.651 

7. Sentence Completion 

18.79 

4.63 

.754 

8. Word Classification 

28.18 

5.34 

.680 

9. Word Meaning 

17.24 

7.89 

.870 

10. Addition 

90.16 

23.60 

.952 

11. Code 

68.41 

16.84 

.712 

12. Counting Dots 

109.83 

21.04 

.937 

13. Straight-Curved Capitals 

191.81 

37.03 

.889 

14. Word Recognition 

176.14 

10.72 

.648 

15. Number Recognition 

89.45 

7.57 

.507 

16. Figure Recognition 

103.43 

6.74 

.600 

17. Object-Number 

7.15 

4.57 

.725 

18. Number-Figure 

9.44 

4.49 

.610 

19. Figure-Word 

15.24 

3.58 

.569 

20. Deduction 

30.38 

19.76 

.649 

21. Numerical Puzzles 

14.46 

4.82 

.784 

22. Problem Reasoning 

27.73 

9.77 

.787 

23. Series Completion 

18.82 

9.35 

.931 

24. Arithmetic Problems 

25.83 

4.70 

.836 
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1. B-coefficients— First, the grouping of variables as described in 7.4 will be 
illustrated. The analysis into groups is begun by selecting the two tests—5 and 9— 
which have the highest correlation, namely, r 59 = .723. The value of B( 5,9) is com¬ 
puted by means of formula (7.16) in Table 7.5. The tests j, comprising the argument 
of B, are for this case z 5 and z 9 , and their correlation appears as the value of L and S, 
since there is only this one correlation in each of the sums. The value T may be 
obtained by (7.17) as follows: 

T = Y r 5e T Y r 9e ~~ ^ r 59 = 8.242 + 8.156 — 2(.723) = 14.952, 
where the sums of the correlations are taken from Table 7.4. Then the value of the 
^-coefficient is 

B( 5,9) = 200(24 - 2)(.723)/(2 - 1)(14.952) = 213. 

The form of computation indicated by Table 7.5 will be found very convenient. 
In addition, the sum L defined in (7.18) and the recurrence formulas (7.19) and (7.20) 
greatly facilitate the calculation of successive B values. 

To illustrate the use of these formulas, the value B (17,18,19,15,16) will be calcu¬ 
lated, showing all details. Here, v = 5, and the last variable added, /, is 16. From 
definition (7.18), using conventional summation notation with j ranging over the 
four variables of the group other than the last one, there results. 

L 5 = Y 0,16 = r 17,16 + r 18,16 + 09,16 + r 15,16 = 1-251, 
j* 16 

where the individual correlations are taken from Table 7.4. The sum S 5 may be 
obtained from the value S 4 by means of equation (7.19) as follows: 

S 5 = S 4 + L 5 = 2.001 + 1.251 = 3.252. 

It should be noted that there are two S 4 values but that one has been rejected as 
indicated by note (10). The next entry is merely 

200 (n - v) = 200(24 - 5) = 3,800. 

The sum T 5 is given by formula (7.20) in terms of the preceding value T 4 , which is in 
the row (17,18,19,15) for v = 4, not in the row (17,18,19,16). Its value is 

T 5 = T a + Y r e,i6 - 2L 5 = 19.648 + 6.609 - 2(1.251) = 23.755, 

el 1 16 

where the sum of the correlations of Test 16 with all other tests is taken from Table 7.4. 
Then 

( V _ 1 )T = 4(23.755) = 95.020, 


and 

£(17,18,19,15,16) = 3800(3.252)/95.020 = 130. 

Following the procedure outlined in 7.4, all the variables are grouped by means 
of ^-coefficients. The groups G p (p = 1,2,3,4, 5) as determined in Table 7.5, may be 
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defined by 


(7.30) 


Gi =(1,2, 3,4), 

G 2 = (5,6, 7, 8,9), 

G 3 = (10,11,12,13), 

G 4 = (14,15,16,17,18,19), 


G 5 = (20,21,22,23,24). 

The grouping by B-coefficients adheres to the original design of this set of tests 
because most of these tests had been used in factor experiments before and because 
the abilities they measured were quite well known. This may not be true in general, 
n a number of published studies the B-coefficients and the succeeding factor solu¬ 
tions failed to verify some postulated factors.* 

2. Bi-factor solution.-A simple bi-factor plan of a general factor and five distinct 
group factors is implied by the grouping of variables (7.30), and is subsequently 
modified as indicated in the course of analysis. In calculating the bi-factor loadings 
it is tacitly assumed that all entries in the correlation matrix are positive (small 
negative values are treated as zeros). The general-factor coefficients are computed 
y means of formula (7.25), making use of appropriate worksheets to facilitate the 
numerical work. The loadings on the general factor are in the first column of the 
body oi Table 7.6, which includes the entire bi-factor solution. 

The general-factor residuals are computed by (7.26) and are presented below the 
lagonal in Table 7.7. These residuals should be insignificant except for those within 
groups, if the grouping of variables is reasonable. In the first four groups the intra¬ 
group residuals are consistently positive and larger than the intergroup values (with 
the one exception, r 10 24 = .255). However, within G 5 there are a number of negative 
residuals and very small positive ones, indicating that no additional factor is required 
for these variables. Hence, the bi-factor plan is modified by eliminating the group 

factor for G 5 and the general-factor residuals among these variables are taken as 
the final residuals. 

formal’ n 9 ^ 0effic ! ents of , each of the grou P factors are computed by means of 
o mula (7.27), employing the general-factor residuals (in italics) in Table 7 7 These 

calculations for the loadings on B„ B 2 , and B 3 are straightforward, and the results 
are shown in Table 7.6. When the triads in (7.27) were computed for any variable 
in G 4 , however, it was found that they varied widely and hence the six variables 
could not be assumed to measure a single factor. In fact, mere inspection of the 
genera -factor residuals points to a grouping of 14, 15, 16, and 17; another grouping 
ol 17, 18, and 19; and acceptance of the general-factor residuals for 14, 15, and 16 

* An example of this appeared m the factor solution obtained by Holzinger and Harman [2411. 
n preparing the test battery, Professor Thurstone had postulated that “verbal reasoning 
numerical reasoning and space reasoning would be separate factors and that these would be 

Verb ? abStraCt] T a [ d vlsual ima S er y” [470, p. 11], Both solutions cut across 
these predetermined groupings that had guided the test construction and revealed some different 



Table 7.5 

Calculation of 5-Coefficients 


j 

V 

L 

S 

200(n - v) 

T 

(v - 1 )T 

B(j) = 

200(n - v)S 

(v- 1)7’ 

Notes 

(5,9) 

(5,9,7) 

(5,9, 7, 6) 

(5, 9, 7, 6, 8) 

(5,9,7, 6,8,23) 

2 

3 

4 

5 

6 

.723 

1.341 

2.058 

2.256 

2.276 

.723 

2.064 

4.122 

6.378 

8.654 

4400 

4200 

4000 

3800 

3600 

14.952 

20.179 

24.245 

28.039 

32.165 

14.952 

40.358 

72.735 

112.156 

160.825 

213 

215 

227 

216 

194 

(1) 

(10, 12) 

(10,12, 13) 

(10, 12, 11) 

(10,12,11,13) 

(10,12,11, 13, 24) 

(10, 12, 11, 13,21) 

2 

3 

3 

4 

5 

5 

.585 

.920 

.912 

1.455 

1.715 

1.584 

.585 

1.505 

1.497 

2.952 

4.667 

4.536 

4400 

4200 

4200 

4000 

3800 

3800 

10.373 

16.117 

15.638 

20.312 

25.102 

24.750 

10.373 

32.234 

31.276 

60.936 

100.408 

99.000 

248 

196 

201 

194 

177 

174 

(2) 

(3) 

(4) 

(5) 

(6) 

(20, 23) 

2 

.509 

.509 

4400 

15.415 

15.415 

145 

(7) 

n a\ 

2 

.468 

.468 

4400 

12.637 

12.637 

163 

(8) 

(\ 4 3) 

3 

.708 

1.176 

4200 

16.530 

33.060 

149 


n a 3 

4 

.865 

2.041 

4000 

19.551 

58.653 

139 


(1,4, 3, 2,21) 

5 

1.189 

3.230 

3800 

24.779 

99.116 

124 

(9) 

(17 18) 

2 

.448 

.448 

4400 

11.921 

11.921 

165 

— 

(17 IS 19) 

3 

.682 

1.130 

4200 

16.311 

32.622 

145 


(\1, 18, 19,16) 

4 

.926 

2.056 

4000 

21.068 

63.204 

130 

(10) 

(17, 18,19, 15) 

4 

.871 

2.001 

4000 

19.648 

58.944 

136 


(17 18, 19, 15, 16) 

5 

1.251 

3.252 

3800 

23.755 

95.020 

130 


(17 18 19, 15, 16, 14) 

6 

1.530 

4.782 

3600 

26.138 

130.690 

132 


(17^ 18^ 19, 15, 16,14, 24) 

7 

1.836 

6.618 

3400 

30.686 

184.116 

122 

(11) 

(17' 18' 19' 15,16, 14,22) 

7 

1.704 

6.486 

3400 

30.423 

182.538 

121 

(12) 

(70 73) 

2 

.509 

.509 

4400 

15.415 

15.415 

145 

— 

(20 23,22) 

3 

.966 

1.475 

4200 

21.176 

42.352 

146 


(20 23, 22, 21) 

4 

1.238 

2.713 

4000 

26.306 

78.918 

138 


(20 23,22,21,24) 

5 

1.623 

4.336 

3800 

31.280 

125.120 

132 


(20 23,22,21,24, 18) 

6 

1.652 

5.988 

3600 

34.784 

173.920 

124 

(13) 

(20, 23, 22,21,24, 16) 

6 

1.641 

5.977 

3600 

34.607 

173.035 

124 

(13) 


NOTES ON TABLE 7.5 

(1) Test 23 is rejected because of 22 points’ drop in B for v = 6. 

(2) Test 13 is rejected because 52 points’ drop in B seems to be too great even tor v - i. . 

(3) Test 11 is retained, although it causes a drop of 47 points in B, because it is of the same general nature as 

(4) Test 13 is retained, although it was previously rejected from the group. After Test 11 wasputinthe 
group, Test 13 seemed to belong together with the others, causing a drop of only 7 points in the value 
of B for v = 4. 

(5) Test 24 is rejected because of 17 points’ drop in B for v = 5. 

$ 20 and 23, another pair of tests wtli bn i„ed 

(8) Test? I'andValthough having^! lower correlation than the pair 20 and 23, produce a higher '■'alue of 
the 5-coefficient. Hence the next group is started with Tests 1 and 4. [continued on p. 129J 
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Table 7.6 

Bi-Factor Pattern for Twenty-Four Psychological Tests 


Test 

j 

General 

Deduction 

B 0 

Spatial 

Relations 

Verbal 

b 2 

Perceptual 

Speed 

B 3 

Recog¬ 

nition 

b a 

Associative 

Memory 

b 5 

Doublet 

Unique 

Vj 

1 

.589 

.484 






.647 

2 

.357 

.285 

— 

— 

— 

— 

— 

.889 

3 

.401 

.479 

— 

— 

— 

— 

— 

.781 

4 

.463 

.317 

— 

— 

— 

— 

— 

.828 

5 

.582 

— 

.574 

— 

— 

— 

— 

.576 

6 

.575 

— 

.559 

— 

— 

— 

— 

.597 

7 

.534 

— 

.708 

— 

— 

— 

— 

.463 

8 

.624 

— 

.375 

— 

— 

— 

— 

.686 

9 

.560 

— 

.628 

— 

— 

— 

— 

.540 

10 

.388 

— 

— 

.594 

— 

— 

.377 

.595 

11 

.521 

— 

— 

.478 

— 

— 

— 

.707 

12 

.404 

— 

— 

.642 

— 

— 

— 

.652 

13 

.576 

— 

— 

.438 

— 

— 

— 

.690 

14 

.388 

— 

— 

— 

.545 

— 

— 

.743 

15 

.351 

— 

— 

— 

.476 

— 

— 

.806 

16 

.496 

— 

— 

— 

.353 

— 

— 

.793 

17 

.422 

— 

— 

— 

.361 

.493 

— 

.670 

18 

.515 

— 

— 

— 

— 

.468 

— 

.718 

19 

.442 

— 

— 

— 

— 

.278 

— 

.853 

20 

.644 

— 

— 

— 

— 

— 

— 

.765 

21 

.645 

— 

— 

— 

— 

— 

— 

.764 

22 

.644 

— 

— 

— 

— 

— 

— 

.765 

23 

.734 

— 

— 

— 

— 

— 

— 

.679 

24 

.712 

— 

— 

— 

— 

— 

.377 

.592 

Contribu¬ 
tion of 
factor 

6.874 

0.645 

1.678 

1.185 

0.779 

0.539 

0.284 



with 18 and 19 as final residuals. These changes may be justified by the crude tests of 
significance of residuals and by the ^-coefficients for the two subsets of variables, 
namely, 

5(14,15,16,17) = 202 and 5(17,18,19) = 177, 
where these calculations are based on the general-factor residuals among the six 


Notes on Table 7.5 contd. 

(9) Test 21 is rejected because of 15 points’ drop in B for v = 5 and seemingly different nature of Test 21 
from Tests 1, 2, 3, and 4. 

(10) Test 16 is rejected because of 15 points’ drop in B to see if some other test will cause a smaller drop. If 

some other test cannot be found which causes a smaller drop in B, then Test 16 will be retained in the 

group at this stage because a drop of 15 points for v = 4 does not seem to be definitely significant. 

(11) Test 24 is rejected because of 10 points’ drop in B for v = 7. 

(12) Test 22 is rejected because of 11 points’ drop in B for v = 7. 

(13) Tests 18 and 16, although they had previously been allocated to another group, are put into the argu¬ 
ment of B along with 20, 21, 22, 23, 24 to see if the latter group of tests must be extended to other 
tests in the battery. The drop in B in each case, along with the seemingly different nature of Tests 18 and 
16, seems to warrant their rejection from this group. 
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SIMPLE FACTOR MODELS 7.6 

variables. The results of this modification lead to the factor coefficients of S 4 and 
B 5 in Table 7.6. 

Finally, one further adjustment was made for the exceptionally large general- 
factor residual between variables 10 and 24. A doublet (factor through only two 
variables) is assumed for these variables. As noted in Table 5.2, it takes at least 
three variables to determine the factor weights for a single factor uniquely, and 
when there are only two variables their description of a common factor is quite 
arbitrary. In the example, approximately one standard error was subtracted from 
the residual before the variance was divided equally between the two variables. The 
resulting doublet factor coefficients for the variables are shown in Table 7.6. 

To complete the linear description of each test in terms of factors there remains 
the determination of the unique factor coefficients. These are given by 

dj = v/l - hj, 

as described in 2.4. The sums of squares of coefficients of the common factors in 
Table 7.6 are computed and recorded as the communalities in Table 7.8. The com¬ 
plement of each of these numbers from unity is the uniqueness, which is also recorded 
in Table 7.8. Then, taking square roots yields the coefficients of the respective unique 
factors in Table 7.6. 

While factorial methods yield the communality and uniqueness of each variable, 
the latter variance may be split into the specificity and error variance sirtiply from 
the knowledge of the reliability of the variable. The reliability of each of the 24 
psychological tests is recorded in Table 7.8, as well as the error variance and specificity 
which follow from formulas (2.20). In addition to this apportionment of the unit 
variance of each test, the index of completeness of factorization, as computed by (2.21), 
is.given in the last column of Table 7.8. 

It is of interest to judge how well a particular factor solution fits the empirical 
data. Generally, only crude procedures for “when to stop factoring” are employed 
(although more rigorous statistical tests are available for certain types of solutions 
as given in sec. 9.5 and 10.4). Among the rules employed is the decision, in advance, 
to analyze up to 50% (or 75 %) of the total variance of a battery of tests; or a suitable 
proportion of the total reliability, leaving room for some specific factors; or, that 
only factors which include at least 5% (or 2%) of the total variance can have any 
practical value, in the sense of being identifiable. Other approximate methods may 
involve some crude standards for the size and distribution of the final residuals. 

In a bi-factor solution there is not as much choice of whether to continue factoring 
or not. The pattern plan predetermines, in a large sense, the proportion of the total 
variance that will be explained by the analysis. Cognizance of the kind of empirical 
rules that are used in such a solution as the principal-factor type may also be of 
benefit in judging the adequacy of a bi-factor solution. Thus, the common-factors 
of Table 7.6 account for just 50% of the total variance of the twenty-four tests, and 
on this basis the solution would not be deemed “over-factored”. The doublet D t 
certainly is not identifiable; and B u S 4 , and B 5 each account for less than 5% (but 
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Table 7.8 

Apportionment of Test Variances 


Test 

./ 

Communality 

hj 

Reliability 

r jJ 

Uniqueness 

dj = 1 - h j 

Error 
Variance 
e 2 j = \ - rjJ 

Specificity 
b 2 j=dj- ej 

Index of 
Factorization 

hj 

C, = 100— 
r JJ 

1 

.581 

.756 

.419 

.244 

.175 

76.9 

2 

.209 

.568 

.791 

.432 

.359 

36.7 

3 

.390 

.544 

.610 

.456 

.154 

71.7 

4 

.315 

.922 

.685 

.078 

.607 

34.1 

5 

.668 

.808 

.332 

.192 

.140 

82.7 

6 

.643 

.651 

.357 

.349 

.008 

98.8 

7 

.786 

.754 

.214 

.246 

-.032 

104.2 

8 

.530 

.680 

.470 

.320 

.150 

77.9 

9 

.708 

.870 

.292 

.130 

.162 

81.4 

10 

,503 a 

.952 

.497 

.048 

.449 

52.8 

11 

.500 

.712 

.500 

.288 

.212 

70.2 

12 

.575 

.937 

.425 

.063 

.362 

61.4 

13 

.524 

.889 

.476 

.111 

.365 

58.9 

14 

.448 

.648 

.552 

.352 

.200 

69.1 

15 

.350 

.507 

.650 

.493 

.157 

69.0 

16 

.371 

.600 

.629 

.400 

.229 

61.8 

17 

.551 

.725 

.449 

.275 

.174 

76.0 

18 

.484 

.610 

.516 

.390 

.126 

79.4 

19 

.273 

.569 

.727 

.431 

.296 

47.9 

20 

.415 

.649 

.585 

.351 

.234 

63.9 

21 

.416 

.784 

.584 

.216 

.368 

53.1 

22 

.415 

.787 

.585 

.213 

.372 

52.7 

23 

.539 

.931 

.461 

.069 

.392 

57.9 

24 

.507 b 

.836 

.493 

.164 

.329 

60.6 


a The communality with the doublet D y included is .641. 
b The communality with the doublet D y included is .645. 


more than 2 %) of the total variance, so that their practical significance may be 
questioned. 

The index C } of completeness of factorization provides another guide to the 
adequacy of the solution. The analysis of psychological tests into common factors 
should not be carried to the point where real specific factors disappear. In the 
example there is only one value of Cj in excess of 100. It is ignored, as probably due 
to chance errors either in the factor weights or in the reliability coefficient. If, however, 
there were several such values for high reliability coefficients, there would be good 
reason to consider a modification of the factor solution. 

In considering the statistical fit of the factor model in 2.6 the standard (2.30) was 
suggested forjudging the agreement of the reproduced correlations with the observed 
ones. This requires the standard deviation of the final residuals to be less than the 
standard error of a zero correlation. The frequency distribution of the final residuals 
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is presented in Table 7.9, where the standard deviation is shown to be .0655. Since 
this value is less than the standard error (.0830) of a zero correlation for a sample of 
145 cases, the required condition is satisfied. 

3. Factor names.— A few words about the naming of factors may be in order. 
It will be recalled that the fundamental purpose of factor analysis is to comprehend 
a large class of phenomena (the values of a set of variables) in terms of a small number 
of concepts (the factors); and, this description is taken to be a linear function of the 
factors. In a mathematical or physical theory it may be sufficient to know that twenty- 
four variables can be described linearly in terms of only six new hypothetical ones— 
that is usually quite an accomplishment, and it is of little concern as to what the six 
new variables are called. But in the biological and social sciences—psychology, for 
example—the practical identification of these new variables (the factors) makes it 
highly desirable to have them named. 


Table 7.9 

Frequency Distribution of Final Residuals 


Value of Residual 

Frequency 

Value of Residual 

Frequency 

.150- .169 

3 

-.130—.111 

8 

.130- .149 

5 

-.150—.131 

6 

.110- .129 

6 

-.170—.151 

— 

.090- .109 

8 

-.190--.171 

— 

.070- .089 

12 

-.210—.191 

— 

.050- .069 

25 

-.230—.211 

— 

.030- .049 

33 

-.250—.231 

1 

.010- .029 

30 



-.010- .009 

39 

Total 

276 

-.030—.011 

29 



-.050— 031 

29 

Mean 

-.0004 

-.070—.051 

15 



-.090—.071 

17 

Standard deviation 

.0655 

-.110—.091 

10 




The coefficients of a factor pattern indicate the correlations of the variables with 
the respective factors and furnish the basis for naming them. In the case of oblique 
factors, to be discussed in later chapters, the structure furnishes the correlations of 
the variables with the factors, and so it is similarly employed in naming the factors. 
The investigator is guided by the magnitude of the factor weights in the selection 
of appropriate names for the factors. The name selected is usually suggested by the 
nature of the variables having the largest correlations with the factor under con¬ 
sideration. This name should be consistent with the nature of the remaining variables 
which have low correlations with the factor. 

The common factors in the example are named from the pattern given in Table 7.6 
and the brief descriptions of the tests. The factor B 0 has positive weights throughout 
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and correlates highest with such deductive tests as Series Completion (23), Woody- 
McCall Arithmetic (24), Problem Reasoning (22), and Word Classification (8). 
Hence B 0 might be called a “general deductive factor.” This name is consistent with 
the nature of the remaining variables—those involving a lesser amount of deductive 
ability have correspondingly smaller factor weights. 

The remaining common factors are named from the subgroups of tests which 
have significant correlations with them. The first group factor is named from the 
“spatial” subgroup (Tests 1-4), the second from the “verbal” subgroup of tests, and 
similarly for the remaining factors. The names of the six common factors are indicated 
in Table 7.6. In addition to the common factors, there is one unique factor for each 
of the twenty-four tests. If a name were desired for any unique factor, it would be 
obtained from the description of the particular test. The only unnamed factor is the 
doublet D x involved in Speed of Adding (10) and Woody-McCall Arithmetic (24). 
This doublet appears to measure “arithmetic speed,” which might appear as a more 
significant factor if more tests of this type were introduced in a battery to experiment 
for this purpose. 

For future work with this factor pattern the doublet will be dropped from con¬ 
sideration, since, as was remarked before, it takes at least three variables to define 
a factor. The six common factors may be referred to by means of symbols or the 
descriptive names, which are tentatively assigned for that purpose. The particular 
name by which a factor is designated, however, should not raise an issue for dispute. 
If another investigator chooses to call these factors by other names, he is free to do so. 
The naming of factors is not a problem of factor analysis, which is a branch of statistics, 
but some descriptive names may be highly desirable in a particular field for purposes 
of classification. 
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Principal-Factor and Related Solutions 


8.1. Introduction 


e principal-factor solution is probably the most widely used technique in factor 
analysis. It was not always that way; the method requires considerable calculations 
!! 1C were much t00 time-consuming before electronic computers were available 
The foundatmn for “the method of principal axes” was laid at the turn of the century 
by Karl Pearson [386]. However, it was not until the 1930’s that the principal-factor 
method as we now know it was developed by Hotelling [259] at the suggestion of 
Kelley. Subsequently, Kelley [305] developed an alternative procedure which twenty 
years later proved to be the most useful one for adaptation to high-speed electronic 
computers (although the later work was done independently). The first applications 
of electromc computers to this problem in factor analysis were made by Wrigley 
and Neuhaus [539, 540], B y 


The methods of this chapter are addressed to the first of the two alternative 
objectives distinguished in 2.3—to extract the maximum variance from the observed 
variables. A typical situation where this is useful is in the reduction of a large body 
of data to a more manageable set. Of greatest interest, of course, are the measurements 
that vary the most among the individuals. Therefore, if a small number of linear 
combinations of the original variables can be found which account for most of the 
variance then considerable parsimony is gained. 

Actually, three methods are considered in this chapter. The fundamental method 
is component analysis” which is introduced in 8.2. Then, an adaptation of it—the 
principal-factor solution—is developed in sections 8.3-8.6 for both hand methods 
and computer operations. The procedures with a desk calculator are presented in 
detail in 8.5, using the example of eight physical variables. Four different sets of 
data are used in 8.7 to illustrate various features of the principal-factor solution. 
A technique for fixing the coordinate system in a given factor space for any solution 
is developed in 8.8 and the relationship to the principal-factor method is noted. 
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Finally, an approximation to the principal-factor method—the centroid method 
is described in 8.9. This was very popular before computers made the principal- 
factor solution feasible. 


8.2. Component Analysis 

The method of principal components, or component analysis, is based upon the 
early work of Pearson [386] with the specific adaptations to factor analysis suggested 
by the work of Hotelling [259]. As noted in 6.2, when the point representation of a 
set of variables is employed, the loci of uniform frequency density are essentially 
concentric,'similar, and similarly situated ellipsoids. The axes of these ellipsoids 
correspond to the principal components. The method of component analysis, then 
involves the rotation of coordinate axes to a new frame of reference in the total 
variable space—an orthogonal transformation wherein each of the n original variables 
is describable in terms of the n new principal components. 

An important feature of the new components is that they account, in turn, tor a 
maximum amount of variance of the variables. More specifically, the first principal 
component is that linear combination of the original variables which contributes a 
maximum to their total variance; the second principal component, uncorrelated 
with the first, contributes a maximum to the residual variance; and so on until the 
total variance is analyzed. The sum of the variances of all n principal components 
is equal to the sum of the variances of the original variables. 

Since the method is so dependent on the total variance of the original variables, 
it is most suitable when all the variables are measured in the same units Otherwise, 
by change of units or other linear transformations of the variables, the ellipsoids 
could be squeezed or stretched so that their axes (the principal components) would 
have no special meaning. Hence, it is customary to express the vanab es in standard 
form i.e., to select the unit of measurement for each variable so that its sample 
variance is one. Then, the analysis is made on the correlation matrix, with the tota 
variance equal to n. For such a matrix (symmetric, positive definite), all n principal 
components are real and positive. 

The formal development of the method of component analysis will not be pre¬ 
sented * Instead, the development will be for the factor-analytic adaptation of the 
method. The important distinction is that the model (2.8) is employed in component 
analysis as contrasted to the model (2.9) for factor analysis. All the variance of the 
variables is analyzed in terms of the principal components, while the communality 
is analyzed in terms of the common factors. Hence, the distinction comes from the 
amount of variance analyzed—the numbers placed in the diagonal of the correlation 
matrix. Analysis of the correlation matrix, with ones in the diagonal, leads to principal 
components, while analysis of the correlation matrix with communahties leads to 
principal factors. It will be shown in chapter 16 that the principal components can 


* For detailed discussion of the method of principal components, see the original work of 
Hotelling [259] and the more recent treatment by Anderson [11, chap. 11]; and for consideration 
of statistical inference in component analysis, see Anderson [12]. 
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be expressed simply in terms of the observed variables, while approximation pro¬ 
cedures are required for the measurement of factors. 

Before the mathematical and computing procedures are presented, it may be of 
interest to see what the results of a component analysis look like. This is shown in 
Table 8.1 for the simple numerical example introduced in chapter 2. The main body 
of the table contains numbers which, when read by columns correspond to the 
principal components, and when read by rows correspond to the variables. Thus, 

Table 8.1 

Principal Components for Five Socio-Economic Variables 3 


Variable 

Pi 

P 2 

P 3 

Pa 

Ps 

Variance 

1 

.5810 


.0276 


■9 

1.0000 

2 

.7671 

-.5448 

.3193 



1.0002 

3 

.6724 


.1149 


.0862 

.9999 

4 

.9324 

-.1043 

-.3078 


.0000 

1.0000 

5 

.7911 

-.5582 

-.0647 

-.2413 

.0102 

.9999 

Variance 

2.8733 

1.7966 

.2148 

.1000 

.0153 

5.0000 

Per cent 

57.5 

35.9 

4.3 

2.0 

0.3 

100.0 


a Again, the only reason for showing the results to so many decimal places is to provide a means for checking 
numerical calculations. 

the entries in the rows are the coefficients of the P’s in the linear expressions (2.8) 
for the z’s; they are also the correlations of the variables with the principal com¬ 
ponents. The direction of a principal component may be reflected, i.e., all entries in 
a column may be multiplied by — 1 without affecting any of the results. 

The sums of squares of entries in the rows are the variances of the variables. The 
variance of each principal component (sum of squares in each column) is shown in 
the next to last row. The principle of maximum contribution to variance of each 
successive component is clearly demonstrated. In this example, the first two principal 
components account for more than 93 per cent of the variance—leading to a reduction 
in the data that would satisfy the most discerning investigator. 

8.3. Principal-Factor Method 

While the method of principal components was devised for the model (2.8), 
Thomson [459] was the first to apply it to the classical factor analysis model— 
although, at the time, his application was only to the Spearman two-factor solution. 
More generally, by the “method of principal factors” is meant the application of the 
method of principal components to the reduced correlation matrix (i.e., with com- 
munalities in place of the ones in the principal diagonal). This is the method that is 
developed in detail in this and the following three sections. 

From the classical factor-analysis model (2.9), the relevant portion for the deter¬ 
mination of the common-factor coefficients may be written: 

(8.1) Zj = a jl F l + • • • + a jp F p + • • • + a jrn F m , 


U= 1 , 2 ,•••,«) 
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where the unique factor has been omitted, and hence the precise representation 
should be z'j but the primes are dropped for simplicity. The sum of squares of factor 
coefficients gives the communality of a particular variable, while any term a 2 p 
indicates the contribution of the factor F p to the communality of z- y The first stage 
of the principal-factor method involves the selection of the first-factor coefficients 
a jx so as to make the sum of the contributions of that factor to the total communality 
a maximum. This sum is given by 

(8.2) V l = a\ j + + • • • + a 2 ^ 

and the coefficients a jx must be chosen so as to make V t a maximum under the 
conditions 

m 

( 8 . 3 ) r jk = £ a jp a kp (j, k = 1 , 2 , • • •, n), 

p= i 

where r jk = r kj and is the communality h 2 of variable z y The conditions (8.3) say 
that the observed correlations are to be replaced by the reproduced correlations, 
implying the assumption of zero residuals. 

In order to maximize a function of n variables when the variables are connected 
by an arbitrary number of auxiliary equations, the method of Lagrange multi¬ 
pliers [16, pp. 152-57] is particularly well adapted. This method is employed to 
maximize V u which is a function of the n variables a jl under the jn(n + 1 ) con¬ 
ditions (8.3) among all the coefficients a jp . Let 

n n m 

(8.4) 2 T = Vi ^ Tjkljk ~ Vi X! X! Tjk^jp^kpy 

j,k = 1 j,k= 1 p= 1 

where n jk ( = ji k] ) are the Lagrange multipliers. Then set the partial derivative of this 

new function T with respect to any one of the n variables a jl equal to zero, namely, 

dT " 

(8.5) = a n - X n jk a kl = 0, 

0a j 1 k=l 

and similarly put the partial derivative with respect to any of the other coefficients 
a jp (p 7 ^ 1 ) equal to zero, that is, 

8T " 

( 8 . 6 ) — = - X n jk a kp = 0 N !)• 

0(2 j p fc= l 

The two sets of equations (8.5) and ( 8 . 6 ) may be combined as follows: 

dT " 

(8.7) -— = <5^! - X fi jk a kp = 0 {p = 1,2, • • •, m), 

OCljp k=l 

where the Kronecker S lp = 1 if p = 1 and 8 lp = 0 if p ^ 1. 

Multiply (8.7) by a n and sum with respect to j , obtaining 

( 8 . 8 ) 8 lp £ a% - £ £ P jk ajia kp = 0 . 

j= i j -1 fc= i 
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Now, the expression Z is equal to a kl according to (8.5), and, setting 

j= i 

n 

£ a 2 n - X u equation (8.8) may be written as follows: 

n 

& 1 p X 1 X! 1 

fc= 1 

Upon multiplying (8.9) by a jp and summing for p, this equation becomes 

(8.io) a ji^i Z a ki[ X! 

k= 1 \p=l I 


j=l 

(8.9) 


or, upon applying the conditions (8.3), 

n 

(8.11) Z r jk a k i - V,! = 0. 

fc= i 


The n equations, one for each value of j, represented by the expression (8.11) may 
be written in full as follows: 

(hi - X)a 11 


( 8 . 12 ) 


31 a ll 


r n t<*u + 


+ r 12 a 21 

+ 

r 13 a 3l + ' ' 

• + 

''iAi 

= o, 

+ (h\ — X)a 2 \ 

+ 

r 23 a 31 T ' ‘ 

+ 

r 2n a n\ 

= 0, 

+ r 32 a 2l 

+ (hl 

1 

"a' 

+ 

' • + 

r 3n a nl 

== 0, 

+ r n2 a 2 1 

+ 

r n3 a 31 Xr • 


— X)a nl 

= o, 


where the parameter of (8.11) is designated by X without a subscript. 

Thus, the maximization of (8.2) under the conditions (8.3) leads to the system of 
n equations (8.12) for the solution of the n unknowns a n . A necessary and sufficient 
condition for this system of n homogeneous equations to have a nontrivial solution 
is the vanishing of the determinant of coefficients of the a n [190, p. 174]. This con¬ 


dition may be written: 



(h\- 

- A) r 12 

r l3 

• r ln 


r 2 1 

(hi ~ A) 

r 23 

• r 2n 

(8.13) 

r 31 

r 32 

(hl-X) ■ 

■■ r 3n 


r n i 

r n 2 

r n 3 

•• Qi-X) 


If the determinant in (8.13) were expanded it would lead to an n-order polynomial 
in X. In expanded form, or in determinantal form, an equation such as (8.13) is known 
as a characteristic equation. Extensive mathematical theory has been developed 
on the properties of characteristic equations [e.g., 376, 516]. For factor analysis, some 
of the important properties include the fact that all roots are real and that a g-fold 
multiple root substituted for X in (8.13) reduces the rank of the determinant to (n - q). 
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When a simple root of the characteristic equation is substituted for X in (8.12) a 
set of homogeneous linear equations of rank (w — 1) is obtained. This set of equations 
has a family of solutions, all of which are proportional to one particular solution. 

From the above analysis, it follows that the factor of proportionality is ytj_ = aj l . 

But this expression is precisely V 1} the quantity which is to be maximized. In other 
words, V t is equal to one of the roots of the characteristic equation (8.13), namely, 
the largest root X v 

The problem of finding the coefficients a n of the first factor F x , which will account 
for as much of the total communality as possible, is then solved. The largest root X l 
of (8.13) is substituted in (8.12), and any solution a 11} a 21 , • • •, a nl is obtained. Then, 
to satisfy the relation (8.2), these values are divided by the square root of the sum of 

their squares and then multiplied by yjx[. The resulting quantities are 

a j\ = a ju\/^i/\/( a ii + afi + • • • + a^) (j = 1,2, • • •, n), 
which are the desired coefficients of F x in the factor pattern (8.1). 

In the mathematical literature the roots (2’s) of a characteristic equation (8.13) 
are referred to as eigenvalues . The solution to the set of equations (8.12) correspond¬ 
ing to each eigenvalue leads to a vector (a set of a’s) which is called an “eigenvector”. 
The generalized mathematical problem is usually expressed in the form: Find a 
number X and an rc-dimensional vector q # O such that 

(8-15) Rq = Xq. 

Any number X p satisfying this equation is called an eigenvalue of R and its associated 
vector q p = {cc lp , cc 2p , • • •, a np } is called an eigenvector of R. An eigenvector scaled 
according to (8.14) is designated a p = {a lp , a 2p ---, a np }. 

The foregoing expression may be viewed another way. The term Rq represents a 
transformation of the vector q, and (8.15) says that this transformed vector is propor¬ 
tional to the original one, with X the proportionality factor. In general, of course, it 
is not to be expected that the transformed vector would be proportional to the 
original one. However, when that is the case then a quantity X exists such that (8.15) 
is satisfied. Such a X must be a root of the characteristic equation (8.13). For each 
root X p the system of equations (8.12) has a non-zero solution q p = (a lp , a 2p , • • •, a„ }. 
Furthermore, the n roots X u X 2 , • • •, X n lead to n vectors q l5 q 2 , • • •, q„ so that (8.15) 
may be written: 

( 8 - 16 ) R (ff iff 2 • • •, q„) = (X t q t , X 2 q 2 , • • •, 2„q„), 

or, upon constituting a matrix Q of the n vectors, 


(8.17) 

where 

RQ = QA, 
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When the analysis is in terms of principal components (with unities rather than 
communalities in the principal diagonal of R), then the vectors in Q are linearly 
independent so that their determinant is different from zero and Q has an inverse. 
Then from (8.17) it follows that 

(8 18) Q -1 RQ = A, 

which brings R into diagonal form in which the elements of A are the eigenvalues 
and the columns of Q are the eigenvectors. Furthermore, since the correlation 
matrix is symmetric, i.e., R = R', transposing in the expression (8.18) yields: 

(8-19) Q'R(Q -1 )' = A. 

From this expression it follows that the rows of (Q 1 ) / are the eigenvectors q l5 q 2 , • • • 
q„ and that therefore Q is orthogonal, with the property: 

(8-20) QQ' = I or Q 1 = Q'. 

Then, for R symmetric, (8.18) becomes: 

(8.21) Q'RQ = A. 

This expression—sometimes referred to as the Spectral Theorem [450, pp. 222-24] 
—says that the matrix R of any symmetric (quadratic) form may be diagonalized by 
means of an orthogonal transformation Q and that the resulting elements of A and 
Q are real with the n eigenvectors linearly independent. 

When the analysis is in terms of principal factors (based upon estimates of com¬ 
munalities in R), then only m (less than n) positive eigenvalues and associated real 
eigenvectors are obtained. When these eigenvectors are scaled according to (8.14) 
they are designated by a l5 a 2 , • • •, a m instead of by q’s, and the matrix of these m 
vectors by A. (There are still n elements of each of these column vectors so that A is 
of dimension n x m.) While A does not have an inverse, the corresponding “ortho¬ 
gonality” property is 

( 8 - 22 ) A ' A = A m = diag(2 l5 X 2 , • • •, X m ), 

or, in expanded algebraic form: 

i a % = k 

j =i 

n (p,q = 1,2,- ,m\ p # q). 

X! a jp a jq = 0 
j= 1 

It will be noted that (8.9) represents a special instance of the orthogonality property. 

The digression in the last four paragraphs gave some of the highlights of the 
mathematical theory as it pertains to component analysis and to principal-factor 
analysis. Returning to the formal development, the coefficients a n of the first factor 
F 1 were determined in (8.14). Conceptually, it helps to think of the method in terms 
of one principal factor at a time, even though in practice they are all computed 



141 




8.3 DIRECT SOLUTIONS 


simultaneously. The next problem is to find a factor which will account for a maximum 
of the residual communality. In order to do this, it is necessary to obtain the first- 
factor residual correlations. Furthermore, in obtaining still other factors the residual 
correlations with two, three, • • •, (m - 1) factors removed are employed, and hence 
a suitable notation is required. A convenient notation for the residual correlation of 
r jk with s factors removed is s rj k . Thus, when the first factor has been obtained, the 
first-factor residuals become 

(8.24) !/> = r jk - a n a kl = a j2 a k2 + a j3 a k3 + • • • + a jm a km . 

More generally, the matrix of the first-factor residuals may be expressed by: 


(8.25) 

1 

II 

0? 

where 


(8.26) 

R{ = a,a; 


represents the n x n symmetric matrix of products of first-factor coefficiefits, i.e., the 
reproduced correlations from the first factor alone. 

In determining the coefficients of the second factor F 2 , it is necessary to maximize 

the quantity 

(8.27) V 2 = a\ 2 + a\ 2 + • • • a 2 n2 , 

which is the sum of the contributions of F 2 to the residual communality. This max¬ 
imization is subject to the conditions (8.24), which are analogous to the restrictions 
(8.3) in the case of the first factor. The theory of characteristic equations provides the 
basis for determining the coefficients of the second and subsequent factors. It is not 
necessary to carry through an analysis for maximizing the contributions of F 2 to 
the residual communality. Instead, it will be shown that the required maximum 
eigenvalue of R x is, in fact, the second largest eigenvalue of the original correlation 
matrix R. 

If a p stand for the m eigenvectors of R (properly scaled), it can be determined 
whether they are also eigenvectors of R^. Postmultiplying the matrix R x by any 
vector a p yields 

(8.28) Ri a P = (R - a i a i) a P 

from the definition (8.25) of the residual matrix. Expanding this expression, and 
applying (8.15) produces: 

(8.29) Ri a P = Ra„ - a i a i a P = Vp ~ a i a iV 

Now consider the two cases: p = 1 and p ^ 1. (a) When p = 1, according 

to (8.9), so that the above expression reduces to 

(8.30) Ri a i = O. 

In other words, the eigenvector corresponding to the largest eigenvalue of R is 
also an eigenvector of Ri but its associated eigenvalue in R x is zero. (b)Whenp^ 1, 
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a'ia p = O according to (8.9), and expression (8.29) becomes: 

(8- 31 ) Ria p = X p a p - a, O = X p a p , . (p # 1), 

which says that, except for X u the eigevenvalues of R 1 are identical with those of R 
and their associated eigenvectors are also identical. The expressions (8.30) and 
(8.31) prove that the eigenvectors of R t are identical with those of R, and that they 
have corresponding eigenvalues except that corresponding to the eigenvector a x in 
Ri is a zero eigenvalue in place of the X 1 in R. 

From the foregoing it is clear that the X 2 of R is the largest eigenvalue of R x . In 
other words, to obtain the coefficients of the second factor F 2 from the largest eigen¬ 
value of the residual matrix Ri it suffices to extract the second largest eigenvalue of 
the original matrix R. By the same type of argument, the successive eigenvalues and 
their associated eigenvectors are obtained directly from the original correlation 
matrix R, until m factors have been extracted. 

When unities are placed in the principal diagonal of R then usually m — n. If some 
numbers less than unities (estimates of communalities) are placed in the diagonal, 
and the positive semi-definite property of R is preserved, then m will usually be less 
than n, and all eigenvalues will be real and non negative. However, the reduced 
correlation matrix R (i.e., with communality estimates in the diagonal) will not be 
positive semi-definite in practice, and both positive and negative eigenvalues may be 
expected. Of course, the negative eigenvalues, and the associated imaginary eigen¬ 
vectors, must be extraneous to a practical problem. Even to retain all the real eigen¬ 
vectors would be an over-factorization because the sum of the positive eigenvalues is 
greater than the original sum of communalities (the negative eigenvalues will reduce 
that sum to the starting value). Since the total communality for the n variables is the 
trace of the reduced correlation matrix, the factorization process should be stopped 
when the sum of the eigenvalues is equal to this value. The investigator will usually 
be satisfied with an even smaller number of factors, as indicated in the examples 
below. 

The theoretical development of this section provides the logical basis for the 
principal-factor solution, but it does not furnish an actual means of computation. 
The direct solution of a characteristic equation (8.13) and sets of linear hopiogeneous 
equations (8.12) would entail insuperable algebraic efforts. Practical means of obtain¬ 
ing principal-factor solutions are discussed in the following sections. 

8.4. Additional Theory 

A method of determining the principal axes is developed in this section, based upon 
two fundamental papers of Hotelling [259,261]. This method involves an iterative 
scheme which yields a root of the characteristic equation and the coefficients of the 
associated factor simultaneously. The roots appear in descending order of magnitude 
upon successive applications of the method. For this reason the method is especially 
suitable in practical situations where only a few of the largest characteristic roots 
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and the associated factor coefficients are required. Usually a small number of roots 
will account for the total (estimated) communality. 

The iterative process is begun by selecting an arbitrary set of n numbers, and trans¬ 
forming them again and again by use of the observed correlations until they converge 
to the desired coefficients of the first principal factor. As noted in the last section, an 
operation Rqi represents a transformation of the vector qi = {a n , a 21 , • • •, a Bl } into 
some new vector. Geometrically, this may be thought of as a rotation of a line through 
the origin (for which the a jl are proportional to the direction cosines) to a new line 
for which numbers proportional to the direction cosines are in the column vector 
(Rq t ). In general, the new line will be distinct from the original one. However, when a 
line remains stationary under such a transformation, then the new vector must be 
proportional to the original vector as indicated in (8.15). This may be expressed by 
the matrix equation: 

(8.32) (R - = O, 

or, in expanded algebraic form: 

(8.33) r^an + r j2 a 21 + ••• + (hj - X)a n + • • • + r jn a nl = 0, 

remembering that the communality is used in place of the self correlation. As j takes 
the values 1 to n, it is readily seen that these equations are identical with (8.12). Thus, 
for any invariant line, the direction cosines are proportional to a solution of (8.12), 
where A is a root of the characteristic equation (8.13). Hence it follows that the invariant 
lines are the desired principal axes. It is thus apparent that, if a set of numbers 
ofi i, a 2 i, • • • • a„i can be found which lead to (8.32), the numbers Rqi are proportional 
to the direction cosines of the principal axes. The coefficients of one of the principal 
factors can be obtained from the latter set of numbers. Furthermore, X is the sum of 
the contributions of this factor to the communalities of the variables. 

In practice, of course, it cannot be expected that the arbitrary numbers a n will 
be so selected as to be proportional to the direction cosines of one of the principal 
axes. The iterative process then involves the use of the derived numbers Rq! as a new 
set of arbitrary numbers in place of a n , and the transformation of the new vector 
Rq 1 becomes R(Rqi). Formal matrix manipulation of this transformation reduces 
it to: 

(8.34) R(Rqi) = R(AqJ = A(Rq t ) = X 2 q, 

upon repeated application of (8.32) and factoring out the scalar X according to 3.2, 
paragraph 21. This transformation effectively does the job of two iterations. 

The property exhibited in (8.34) may be generalized to any power, say m, for any 
vector q, as follows: 

(8.35) R m q = X m q. 

Then the improvement in the iteration process need not end with the employ¬ 
ment of R 2 . After doubling the speed of convergence by squaring R, it can be doubled 
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again by squaring R 2 , i.e., by multiplying a set of trial values by R 4 , and thus the 
equivalent of four multiplications by R is obtained. Upon squaring again, a matrix 
R 8 is obtained, and multiplication by it is equivalent to eight multiplications by R, 
and so forth to any power of the correlation matrix. This squaring process is con¬ 
tinued until the convergence is so rapid that additional matrix squaring is not worth 
while. A scheme for determining the number of times a matrix should be squared is 
developed in the next section, in connection with the actual computing procedures. 

The iterative process is continued until the ratios among the quantities obtained 
at any stage converge to the corresponding ratios among the coefficients of F l to any 
specified degree of accuracy. The proof of the convergence of these ratios to those of 
the coefficients a jl of the first principal factor is given by Hotelling [259, Sec. 4]. A 
convenient procedure is to divide each of the trial values by a fixed one of them, say 
the largest. Then the next value obtained, corresponding to this number, will be an 
approximation to the characteristic root X v 
The second and remaining principal factors may be determined by the same method, 
and the convergence can be accelerated by the use of a convenient power of the 
matrix of residual correlations. It is not necessary, however, to obtain this power of 
the residual matrix by repeated squarings, as was done in the case of the original 
matrix of correlations. Instead, the determination already made of the power of R 
and the following algebraic properties of matrices can be employed for this purpose. 
In getting the square of the residual matrix algebraically from (8.25), na!imely: 

(8.36) R 2 = R 2 - 2RR1 + R* 2 , 

it is possible to express the last two terms by quantities already known, So that the 
actual squaring of R 1 is obviated. Thus, from the definition (8.26): 

Rl 2 = (a 1 a' 1 )(a 1 a' 1 ) 

= Malaga; 

= a 1 X 1 a\, from (8.22), 

= since X 1 is a scalar, 

and applying definition (8.26) to the last expression produces: 

(8.37) R\ 2 = X.Rl 

Also, from the expression (8.15) for the particular case of the first principal factor 
(with coefficients aj of the correlation matrix R, the following relationship 

Rai = X^! 

provides the basis for expressing RR| in terms of known quantities. Postmultiplying 
by a\ and applying definition (8.26) yields: 

(8.38) RR\ = X,R\. 

Upon substituting the known quantities from (8.37) and (8.38) for the terms in (8.36), 
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the square of the residual matrix becomes: 

(8.39) Rj = R 2 - X,R\. 

In other words, actual squaring of the residual matrix is not necessary since the square 
of the correlation matrix and the matrix of products of first-factor coefficients are 
available. 

In similar fashion, it can be shown that for any positive integer e, 

(8.40) R\ = R e — 

Thus the e th power of the residual matrix is expressed in terms of the e th power of 
the original correlation matrix, obviating actual multiplications of the residual 
matrix. 

From the foregoing development the order of procedure of the iterative scheme 
may be summarized. Using R e as the basis for selecting the set of trial values, this 
set rapidly yields the values of the first-factor coefficients* and the characteristic root 
X v Furthermore, the value X\ will be determined from the multiplication of the set 
of trial values by R e , and X e ~ l can be obtained by division. Then multiplying 2 e_1 
by each element of R| and subtracting from the corresponding element of R e , the 
e th power of the residual matrix is obtained. The second-factor coefficients are 
obtained from Ri and R\ in the same manner as the first-factor coefficients are 
determined from R and R e . It may not be necessary to employ the e th power of Ri 
in calculating the second-factor coefficients when rapid convergence is evident. Then 
some lower power of R 1? or Ri itself, is employed. This is illustrated in the next 
section. To calculate the third-factor coefficients, the matrices R 2 of second-factor 
residuals and R| (or some lower power of R 2 ) are employed. The latter matrix is 
obtained conveniently by an expression of the form (8.40) relating the second- to 
the first-factor residuals. Further factors are determined similarly until approximately 
all the communality is analyzed. 

8.5. Computing Procedures With Desk Calculator 

A principal-factor pattern can be obtained for any matrix of correlations. Since 
the work entailed is very laborious for problems with many variables, procedures 
employed with high-speed electronic computers are indicated in 8.6. For a limited 
number of variables, the method of 8.4 using desk calculators is feasible. The numerical 
calculations are exhibited in the following outline form, employing the example of 
eight physical variables of Section 5.4 for illustrative purposes. 

1. Organization of data. —Because of the symmetry of the correlation matrix, 
it usually suffices to write only half the table. For the principal-factor method, 
however, where squaring of the correlation matrix is involved, it will be found more 
convenient to write the symmetric matrix in full. Thus, for the example, the correla- 

* Burt [54] points out that to factor a matrix R e is equivalent to obtaining a Spearman general 
factor. This arises from the fact that, with a sufficient number of self-multiplications, any sym¬ 
metric matrix can be reduced as closely as desired to a matrix of rank one. 
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:^ES5=3^«^.=k 

factor coefficients S *° " UmberS dlrectl ? P ro P«ional to the desired 

diagonal' dtrntrd^“eTabove 1(o^belowl j^d “i “ T^' ° nly the 

t b s e wren h ho qUare f‘ “ Symme ‘ riC matriX “ ak ° 

is written, however, for convenience of further squaring. P matrix 

Table 8.3 

Square of Correlation Matrix: R 2 
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A check is available on the squaring process. Compute the product of R by the 
column of values S, of Table 8.2. These values, 


Tf> = £ r kJ S t , 


0 = 1 , 2 ,•■•, 8 ), 


fc= 1 


should agree, except for errors of rounding, with the respective sums Sf of the rows 
° f ?ake as the next set of 

sets of trial values, then the squaring proc corresponding aJJ* so that 

present case the values differ ™ |tan(tarf is t0 obtain agreement 

as : - “ values of 

di Squared mat^of co"ons a^uffiden, number of times to make the succes¬ 
sive sets of trial values approximately^equal.^n^ralculating^the variou^povve^s^^^ 

always determine the cdeck column , fi t ^ el2) before the e i ements of R‘ are 
obtained Sen the valuesof 5# may be computed and if they agree with the values 

«K> are calculated. These values are n0 ^ 

sitheelementsofR 4 arecomputedmTable8.4.Next tnevames ( , ^ ^ 

in Table 8.5. Since the maximum difference between a n a u y 
cnnHths it is therefore not necessary to calculate the entries o . 


1 

2 

3 

4 

5 

6 

7 

8 


Table 8.4 

Fourth Power of Correlation Matrix: R 4 


1 

2 

3 

" 65.54 

64.94 

61.95 

64.94 

64.38 

61.41 

61.95 

61.41 

58.59 

63.06 

62.49 

59.61 

56.04 

55.27 

52.67 

47.80 

47.13 

44.91 

42.00 

41.41 

39.45 

|_ 46.59 

45.98 

43.82 


4 

5 

6 

63.06 

56.04 

47.80 

62.49 

55.27 

47.13 

59.61 

52.67 

44.91 

60.67 

53.86 

45.93 

53.86 

50.40 

43.06 

45.93 

43.06 

36.80 

40.36 

37.97 

32.45 

44.78 

41.61 

35.55 


7 8 

ST 

42.00 46.59 

447.92 

41.41 45.98 

443.01 

39.45 43.82 

422.41 

40.36 44.78 

430.76 

37.97 41.61 

390.88 

32.45 35.55 

333.63 

28.62 31.33 

293.59 

31.33 34.39 _ 

324.05 


TT 

rv< 4 > 
a j 1 

447.93 

1.0000 

443.02 

.9890 

422.43 

.9431 

430.76 

.9617 

390.90 

.8727 

333.64 

.7448 

293.60 

.6555 

324.05 

.7234 


tup last set of trial values is used as the trial vector 
q tcnmak^cermfn that the squaring process ki a s indeed pro^irccd^ninva^mnt 

hne e the derived numbers are immediately proportional to the “i* accordl “® , 
£^I„\bto 8.6 the first column of arbitrary numbers are the values , 
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Table 8.5 

Eighth Power of Correlation Matrix: R 8 


4 5 6 7 8 Sj 8) T< 8) oe ( - 8) 

— ~ ~ — — " — 176738 1.0000 

— — _ 174853 .9893 

~ ~~ ~ ~ — — 166733 .9434 

~ ~ ~ — — — 169980 .9618 

~ ~ ~ — — 153733 .8698 

~ ~ ~ — — — 131201 .7423 

~ ~ ~ ~ — ~ 115430 .6531 

— — — _ 127505 .7214 


Table 8.6 

Calculation of the F x Coefficients 


Variable 

j 

*ji 

Rq, 

a n 


a j i 

1 

1.0000 

4.4556 

1.0000 

.858 

2 

.9893 

4.4083 

.9894 

.849 

3 

.9434 

4.2038 

.9435 

.810 

4 

.9618 

4.2852 

.9618 

.825 

5 

.8698 

3.8757 

.8698 

.747 

6 

.7423 

3.3076 

.7423 

.637 

7 

.6531 

2.9099 

.6531 

.561 

8 

.7214 

3.2142 

.7214 

.619 


8 

/h = 4.4556, £ a?! = 6.0487 

j= i 


= -85827 


determined by the squaring process. These numbers are then multiplied by R. After 
all the Rq, values have been obtained, they are divided by the largest of them to get 
the new quantities (the same notation being used for simplicity). Since the max¬ 
imum discrepancy between the old and the new values a n is only .0001, these numbers 
are accepted as stationary. The value of Rq 1 corresponding toa u = 1.0000 is the 
first characteristic root X x . Then the coefficients of the first factor can be calculated 
by means of (8.14), as indicated in the last column of Table 8.6. 

A check on the final determination of the aji is provided by 

t a n = 

j= i 

In other words, the sum of the contributions of the first factor to the total 
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communality must be equal to the first characteristic root. The value of Ai from the 
analysis is 4.4556 and the sum of the squares of the coefficients is 4.455, so that the 

check is satisfied (within rounding errors). 

4. First-factor residuals.— To assist in calculating the residuals with the first 
factor removed, a table of products of the first-factor coefficients may be prepared. 
Such a product, matrix Rl for the given data is presented in Table 8.7, where the 
symmetric elements above the diagonal have been omitted for simplicity. 


Table 8.7 

Product Matrix: Rl = (a n a kl ) 



1 

2 

3 

4 

5 

1 

'.736 

— 

— 

— 

— 

2 

.728 

.721 

— 

— 

— 

3 

.695 

.688 

.656 

— 

— 

4 

.708 

.700 

.668 

.681 

— 

5 

.641 

.634 

.605 

.616 

.558 

6 

.547 

.541 

.516 

.526 

.476 

7 

.481 

.476 

.454 

.463 

.419 

8 

.531 

.526 

.501 

.511 

.462 


6 

7 

8 

En 

a n D r 


_ 

_ 

5.067 

5.067 


_ 

_ 

5.014 

5.014 


_ 

_ 

4.783 

4.784 


_ 

_ 

4.873 

4.872 


_ 

_ 

4.411 

4.412 

.406 

_ 

— 

3.763 

3.762 

.357 

.315 

— 

3.312 

3.313 

.394 

.347 

.383 

3.655 

3.656 


To check the calculation of the elements of the product matrix Rl, obtain the sums 
of the rows (for the matrix with all elements included) and compare with the corres¬ 
ponding values of a jl D 1 where 


= Z a fci- 

&= i 

The sum of the first-factor coefficients for the given data is = 5.906. The sums E n 
and check sums a jl D 1 are also recorded in Table 8.7. 

Subtract the values in Rl from the corresponding entries in R to get the matrix 
of first-factor residuals R^ This matrix is presented in Table 8.8, and is written in full 
to simplify later multiplications with it. The sums of the rows are given in a column 
alongside the matrix R l5 and they should be equal to the differences between the 


Table 8.8 



Matrix of First-Factor 

Residuals : 

Ri 



l 

2 

3 

4 

5 

1 


.118 

.118 

.110 

.151 - 

.168 

2 


.118 

.176 

.193 

.126 - 

.258 

3 


.110 

.193 

.177 

.133 - 

.225 

4 


.151 

.126 

.133 

.102 - 

.180 

5 


-.168 

-.258 

-.225 

-.180 

.312 

6 


-.149 

-.215 

-.197 

-.197 

.286 

7 


-.180 

-.199 

-.217 

-.136 

.311 

8 


-.149 

-.111 

-.156 

-.146 

.167 


6 

7 

8 

Sji 

«,(!) 

a j2 

-.149 

-.180 

-.149 ' 

-.149 

-.6082 

-.215 

-.199 

-.111 

-.170 

-.6939 

-.197 

-.217 

-.156 

-.182 

-.7429 

-.197 

-.136 

-.146 

-.147 

-.6000 

.286 

.311 

.167 

.245 

1.0000 

.281 

.226 

.183 

.218 

.8898 

.226 

.206 

.192 

.203 

.8286 

.183 

.192 

.196 

.176 

.7184 
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corresponding sums in R and R\, that is, 

S n =Sj- E n . 

5. Trial values for second-factor coefficients.—To obtain the best set of trial 
values for calculating the second factor coefficients, an appropriate power of Ri 
is employed. It is not necessary, however, to perform repeated squarings onR b since 
formula (8.41) gives any power of the residual matrix in terms of that power of the 
original correlation matrix and the product matrix. Furthermore, since the actual 
entries of R 2 , or any higher power of R l9 are not required for the determination of 
trial values if the sums of the rows are known, additional labor can be saved. The 
values Sj 2) and E J1 may be considered as elements of the matrices R 2 and R^, respec¬ 
tively. Then, according to (8.41), 

Sjf = Sf’ - X.Ej,, 

so that Sjf *, the sum of the elements in row / of the matrix Rf, can be obtained without 
calculating the individual entries in R 2 . 

Construct Table 8.9, in which each block contains the derivation of the trial values 
from the power of R 1 represented by the superscript on a j2 . In the first block the 


Table 8.9 

Determination of Trial Values for Calculating the F 2 Coefficients 


Variable 

j 

Sj 


Sji 

„<!) 

a j2 


'b-Eji 


nt <2 > 
a j 2 

1 

4.918 

5.067 

-.149 


22.37 

22.58 

-.21 

-.57 

2 

4.844 

5.014 

-.170 

-.694 

22.07 

22.34 

-.27 

-.73 

3 

4.601 

4.783 

-.182 

-.743 

21.04 

21.31 

-.27 

-.73 

4 

4.726 

4.873 

-.147 


21.50 

21.71 

-.21 

-.57 

5 

4.656 

4.411 

.245 

1.000 

20.02 

19.65 

.37 

1.00 

6 

3.981 

3.763 

.218 

.890 

17.10 

16.77 

.33 

.89 

7 

3.515 

3.312 

.203 

.829 

15.07 

14.76 

.31 

.84 

8 

3.831 

3.655 

.176 

.718 

16.54 

16.29 

.25 

.68 


values of Sj and E jl are copied from Tables 8.2 and 8.7, and the sums S n are obtained 
by subtraction. Then the trial values are calculated by dividing the sums S jl by 
the largest one (in absolute value), that is, by S 51 = 2.45. Record the values of Sj 2) 
from Table 8.3 in the second block of Table 8.9, retaining only two decimal places 
since all the work is based upon three significant figures, and one additional figure is 
sufficient to assure the accuracy of the three figures. Compute the products ^E^, 
with the value = 4.4556 taken from Table 8.6. The sums Sj 2) are obtained simply 
by subtraction, and the corresponding a} 2) are then determined. These values are 
truly significant to only one decimal place, and in the one significant figure they agree 
with a ( j2 \ If the calculations in another block were attempted, the corresponding values 


151 

















8.5 DIRECT SOLUTIONS 


of Sj 4) and X\E jl would be equal to three significant figures. For example, S (4) = 448 
and X\ £ n = 448. Hence the sums Sj 4) (that is, Sj 4) - XlE n ) are insignificant and 
aj 4) cannot be obtained. It therefore follows that the best set of trial values for calculat¬ 
ing a j2 is a$. 

The numbers o^§ } are used as the first set of trial values a j2 by which to multiply 
the matrix Ri to obtain the first column of Riq 2 in Table 8.10. Then division of 


Table 8.10 

Calculation of the F 2 Coefficients 


Variable 

j 

*/2 

Rl92 

a /2 

Ri92 

a /2 

R,q 2 

«/2 

„ < W 2 2 

a ' 2 ~ vT 7 

1 

-.57 

- .865 

-.580 

-.8856 

-.5851 

- .8852 

-.5851 

-.328 

2 

-.73 

-1.100 

-.737 

-1.1152 

-.7368 

-1.1148 

-.7369 

-.414 

3 

-.73 

-1.097 

-.735 

-1.1118 

-.7345 

-1.1112 

-.7345 

-.412 

4 

-.57 

- .902 

-.605 

-.9129 

-.6031 

- .9129 

-.6034 

-.339 

5 

1.00 

1.492 

1.000 

1.5136 

1.0000 

1.5129 

1.0000 

.561 

6 

.89 

1.348 

.903 

1.3666 

.9029 

1.3660 

.9029 

.507 

7 

.84 

1.300 

.871 

1.3144 

.8684 

1.3142 

.8687 

.488 

8 

.68 

.987 

.662 

1.0005 

.6610 

1.0001 

.6610 

.371 


8 

2 2 = 1.5129, £ aj 2 = 4.8027 

j= i 


a % = -56126 


these values by the largest one (1.492) yields the next set of trial values, which are also 
designated by a j2 for simplicity. Multiplication of R 1 by these new values and division 
of the derived values by the largest one of them produces the next set of trial values. 
This process is continued until corresponding trial values in successive sets agree to 
three significant figures. Keep four figures if three significant figures are desired for 
the factor coefficients. Three iterations were sufficient, in the present example, for 
stability in the trial values a j2 . 

6. Second-factor coefficients.—When trial values a j2 have been found that are 
stabilized, i.e., satisfy an equation like (8.32) but for the second factor, then the 
coefficients for such a factor can be computed. The value 1.5129 corresponding to 
a 52 = 1.0000 is the characteristic root X 2 , and the coefficients of the second factor are 
given by a formula like (8.14), as indicated in the last column of Table 8.10. 

The calculation of the coefficients a j2 can be checked by means of the formula 

Yj a j2 = X 2 . 

i=i 
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In the example, the sum of the squares of the eight coefficients is 1.511 and X 2 = 1.5129, 
agreeing within errors of rounding. 

7. Adequacy of solution.—To obtain additional factors, residuals would be 
calculated with all preceding factors removed and then trial values for the new 
factor coefficients would be determined, following the outline of paragraphs 4,5, and 6. 

For the given data no further factors are required. The second-factor residuals— 
obtained by subtracting the products a j2 a k2 from the corresponding first-factor 
residuals x r jk of Table 8.8—are recorded in Table 8.11. These residuals are obvioiisly 
insignificant and so may be considered as final. 


Table 8.11 

Matrix of Second-Factor Residuals: R 2 



l 

2 

3 

4 

5 

6 

7 8 

1 

.010 

— 

_ 

— 

— 

— 

— — 


2 

-.018 

.005 

— 

— 

— 

— 

— — 


3 

-.025 

.022 

.007 

— 

— 

— 

— — 


4 

.040 

-.014 

-.007 

-.013 

— 

— 

— — 


5 

.016 

-.026 

.006 

.010 

-.003 

— 

— — 


6 

.017 

-.005 

.012 

-.025 

.002 

.024 

— — 


7 

-.020 

.003 

-.016 

.029 

.037 

-.021 

-.032 — 


8 

_ — .027 

.043 

-.003 

-.020 

-.041 

-.005 

.011 .058 _ 


Inasmuch as 

the problem of factor analysis is 

to account for the total communality 


variance, a more definite check on the adequacy of a solution is afforded by the extent 
to which the sum of the contributions of the factors agrees with the original total 
communality. In the present example, two common factors account for practically 
100 per cent of the communality. The percentage contributions of the individual 
factors are presented in Table 8.12, where the complete principal-factor pattern is 
exhibited. 

8. Interpretation of principal factors.—The coefficients of the first factor in 
Table 8.12 are all large and positive, indicating an important general factor of physical 
growth (G) among these variables. On the other hand, the second factor has loadings 
of opposite signs for the two subgroups of variables. From the nature of the variables, 
this bipolar factor might be called “Stockiness.” If desired, of course, the signs of all 
the coefficients of this factor may be changed. Then this factor might be labeled 
“Lankiness.” 

Whatever name is selected for a bipolar factor, its opposite characteristic should be 
clearly recognizable. A more fundamental approach is to find a basic term which 
connotes the entire continuum. For example, a bipolar factor which is named “Heat” 
(or, “Cold”) would have the opposite characteristic “Cold” (or, “Heat”). A name 
representing both of these characteristics is “Temperature.” These two approaches 
may be indicated schematically as in Figure 8.1. 

Inasmuch as “Stockiness” and “Lankiness” are not clearly distinguishable as 
opposites (according to a of Fig. 8.1), neither of these seems to be an appropriate 
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Table 8.12 

Principal-Factor Pattern for Eight Physical Variables 


Variable 

j 

Pattern Coefficients 3 

Communality 

G 

BT 

Uj 

(1) 

Original 

(2) 

Calculated 

(D—(2) 

1. Height 

.858 

-.328 

.395 

.854 

.844 


2. Arm span 

.849 

-.414 

.328 

.897 

.892 


3. Length of forearm 

.810 

-.412 

.417 

.833 

.826 


4. Length of lower leg 

.825 

-.339 

.452 

.783 

.796 


5. Weight 

.747 

.561 

.357 

.870 

.873 


6. Bitrochanteric diameter 

.637 

.507 

.581 

.687 

.663 


7. Chest girth 

.561 

.488 

.669 

.521 

.553 


8. Chest width 

.619 

.371 

.692 

.579 

.521 


Total 

— 

— 

— 

6.024 

5.968 

.054 

Contribution of factor ( V p ) 

4.455 

1.511 


— 

— 

— 

Per cent of total original communality 

74.0 

25.1 

■ 

— 

99.1 

.9 


a Since the reliability of any one of these physical variables is close to unity, the unique factor in each case 
may be considered as essentially the specific factor. The index of completeness of factorization (2.21) is 
then approximately 100 times the calculated communality of each variable. 


name for the bipolar factor. In an attempt to get a name, of the type b, which transcends 
the specific descriptions of the variables, the term “Body Type” ( BT ) has been 
adopted. On this continuum, variables describing different body types have projec¬ 
tions of opposite sign. 

The diagram for these eight variables in the plane of the two principal factors is 
presented in Figure 8.2, the coordinates coming from Table 8.12. The two subgroups 
of variables lie in the first and fourth quadrants. Hence the projections of all the points 
form a single cluster on the positive end of the G axis. The projections on the BT 


(a) 


0 


Cold 


Heat 


(b) -1- 

0 

Temperature 

Fig. 8.1.—Example of alternative ways of naming a bipolar factor 
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Fig. 8.2.—Eight physical variables plotted with respect to two principal-factor axes 

axis, on the other hand, fall into two clusters which are widely separated. The projec¬ 
tions on the respective axes give the geometric basis for the interpretation of the 
general and bipolar factors. 

8.6. Outline of Electronic Computer Program 

Factor analysis has been troubled with the practical difficulty of requiring an 
excessive amount of computational work. This is indicated, more or less, throughout 
the text but is especially evident in this chapter. On the other hand, the foregoing 
development of the principal-factor method points to a certain elegance and precision 
of mathematical form lacking in earlier methods. For this reason, alone, it is a highly 
desirable form of factorization of a correlation matrix, in spite of the fact that bipolar 
type factors may not be acceptable to psychologists. The procedure usually recom¬ 
mended by psychologists is to initiate the analysis of a correlation matrix by means 
of some arbitrary solution and then rotate it to some (psychologically) more meaning¬ 
ful solution (see part iii). Thurstone, the leader and chief proponent of this philosophy, 
has stated [477, p. 509]: “When the principal-axes solution becomes available with 
less computational labor, it will, no doubt, be preferred by all students of this subject, 
and they will start the rotational problem with the principal-factor matrix.” 
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While the computation of principal-factor solution for an 8-variable problem is 
feasible by the method outlined in 8.5 (generally requiring less than ten hours of hand 
calculations), the situation is entirely different for a problem of the order of magnitude 
of the 24-variable example of 7.6. The computational labor is simply prohibitive for 
the complete principal-factor analysis by hand methods. The time required for the 
calculation of the first-factor weights, alone, for the twenty-four variables is more 
than seventy hours [243, p. 175]. The determination of each additional factor is 
estimated to require upwards of forty hours. In 1941, Holzinger and Harman [243] 
recommended that the application of the direct principal-factor method to large sets 
of variables await the development of appropriate computing machinery. 

The time has arrived in which the principal-factor method is feasible, even for very 
large matrices of correlations. What makes it possible is the rapid development and 
availability of high-speed digital electronic computers. While there are financial 
constraints to the procurement of such computing facilities, most universities and 
government and industrial research centers have such equipment and make it 
available, frequently without cost, to individuals engaged in research. The principal- 
factor method can now be considered in its own right as a preferred type of solution, 
or as an excellent reduction of the correlation matrix which provides a basis for 
rotation to some other form of solution. 

While the computing procedure developed in 8.4 is appropriate for desk cal¬ 
culators, it is not the most expeditious method for programming a high-speed digital 
computer. There, it will be recalled, one eigenvalue and its associated eigenvector was 
calculated, and then the others were determined successively. With a computer it is 
more convenient to obtain all the eigenvalues and their associated eigenvectors 
simultaneously. 

The factor analysis theory of 8.3 leads to a classical problem in mathematics— 
the determination of the eigenvalues and associated eigenvectors of the reduced 
correlation matrix R. The computation procedures on numerical methods for the 
solution of systems of linear equations, with the associated problems of matrix 
inversion and characteristic equations are then available to factor analysis. In the 
publication Mathematical Methods for Digital Computers [200] there is not only a 
chapter on factor analysis proper, but Part II on matrices and linear equations has 
a direct bearing on factor analysis. Modern day computing adaptations of the 
original work done by Jacobi [278] more than a century ago has been found very 
effective in programming the principal-factor solution on all present-day electronic 
computers. The earliest programs written were for the ORDVAC [539], the Johnniac 
[161], and the IBM 704 [418]. These programs have since been adapted to the latest 
computers throughout the world. Improvements in speed and accuracy of the 
convergence process have been the subject of many papers (see, for example, [427], 
[428], [516], [521]), and no doubt will continue for a long time to come. It is the intent 
of this section to provide the general spirit of the basic computing routine. 

The essence of the modified Jacobi method involves the diagonalization of the 
matrix R by performing a sequence of orthogonal transformations on it, designed to 
reduce one off-diagonal element to zero at each stage. Each orthogonal transforma- 
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tion is of the form B jk RB' jk where 

n 


i 

cos 0 jk 
1 


~sin e jk 


(8.41) 


B 


'jk 


sin 6 jk 


cos Q jk 


wi*h t he four elements in the intersections of rows and columns j and k as indicated 
all other diagonal elements unity, and all other elements zero. The angle of rotation 
0 jk is chosen so as to transform the element r jk into zero, and is defined by 


(8.42) 


tan 20 jk 


2r jk 




K 


When an element is reduced to zero it does not, in general, remain at zero during 
subsequent transformations. However, the sum of squares of off-diagonal elements is 
decreased each time by an amount corresponding to 2 r%, and thus Jacobi's method 
guarantees the convergence of the off-diagonal elements of R to zeros (to a designated 
number of decimal places) with a sufficient number of iterations. 

In the process of reducing the original matrix R to a diagonal matrix D, the off- 
diagonal elements are considered systematically. Since the original correlations r„ 
are altered by the transformations (8.41), it would seem more appropriate to designate 
a general off-diagonal element by d jt and the intermediate derived matrices by D’s 
with suitable subscripts indicated below. Thus, at any stage of a specific transforma- 

wffi be r emptyed edUCln8 ° ff ' dlag0nal element d » t0 zero > the following notation 

<8 ' 43) ,D = S Jktl , _, 

where the sequencing of the transformations is derived from: 


(8.44) 


n(j - 1) 


JU + 1 ) 


+ k 
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in which j = 1,2 - 1 and k=j + 1 J + 2,•••,*, producing the n(n - l)/2 

P °At b any stag^the product of the v individual transformations may be designated 

(8.45) V B = • • • B 13 B 12 , 

where v is given by (8.44), and when all combinations of j and k have been tried, the 
product of these transformations in the i th iteration is designated: 

(8.46) Bi = ll(B jk , j <k = 1,2, •••,«), (i = 1,2,-• •,/*)• 

The diagonalizing effect of the i th iteration is given by: 

(8.47) D ( = B ( D,-iB', (i = 1,2, •••,/*)■ 

If the product (in the indicated order) of the transformation matrices through the i th 
iteration is defined by: 

(8.48) P i = B i B i _ 1 -*-B 2 B 1 , 

then formula (8.47) can be expressed in terms of the original matrix R as follows. 

(8.49) D i = P i RP 'i 

After a specified number of iterations /i, the solution to the problem will be in the 
form: 

(8.50) 
where 

(8.51) 

The diagonal elements X r of D are the eigenvalues of R, and the rows of the final 
transformation matrix B contain the corresponding eigenvectors 

otp = ( a lp> a 2 p’ ‘ ' ' > a np) 

of R. The row vector is shown as the transpose of the column vector previously 
defined in 8.3. Since each individual transformation matrix is orthogonal (and hence 
the product of such matrices is orthogonal), the resulting final matrix B is orthogona 
and therefore the eigenvectors are normalized. Then the coefficients of each factor 
F p are obtained simply by multiplying the square root of the eigenvalue by its 
associated eigenvector, namely: 


D„ = B JV i B ' = BRB ' = D = (W 


b = p. = b,b,_ 1 ---b 2 b 1 . 


a p — a p 


p- 


(8.52) , ... 

This formula corresponds to (8.14) in which the first arbitrary eigenvector had to be 

normalized as well as converted to factor weights. 

In general, when the rank m of R is equal to its order n, there will be n real non 
negative eigenvalues, and coefficients for n common factors will ensue. If, however, 
the reduced correlation matrix R is positive semi-definite and of rank m<n then 
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there will be n - m zero eigenvalues and there will be only m common factors with 
non zero coefficients. Of course, if the Gramian properties are not satisfied by the 
reduced correlation matrix, then some of the eigenvalues will be negative and the 
coefficients of the corresponding factors will be imaginary. 

An outline of the foregoing procedures for programming a large-scale digital 
electronic computer is presented in the flowchart of Figure 8.3. The detailed descrip¬ 
tion of each step in this flowchart is as follows: 

1. The n(n + l)/2 distinct elements of the reduced correlation matrix R are 
stored by rows, each row starting with its diagonal element. Also stored are 
the trace T( R), the order of the matrix, n, and the maximum number of 
iterations, /i (it may be desired to stop the machine after a predetermined 
number of iterations regardless of the degree of convergence). The matrix R 
also is put in the working location V D as the initial 0 D. 

2. The identity matrix is put in the V B locations as the initial 0 B. The counter j 
is set to 1 and k to 2. 

3. The d jk element of V D (initially, rj k of R) is tested to see if it is zero to the 
specified (ej degree of accuracy. If d jk is zero to the degree €l proceed to 
step 7, otherwise to step 4. 

4. Compute B jk as indicated in formula (8.41), using formula (8.42) and well- 
known trigonometric identities to get the values of the sine and cosine of the 
transformation angle (employing only algebraic functions of the argument). 

5. Compute V D according to formula (8.43), remembering that 0 D = R. 

6. Compute V B according to formula (8.45), which is equivalent to V B = B- w _ n B 

and again, 0 B = I. J 

7- If k has reached n then all columns have been considered for the particular 
choice of the row j and the test of step 9 is made. Otherwise proceed to step 8. 

8. The index k is increased to continue with the next individual transformation 
in the particular iteration. 

9. If j has not reached n - 1 proceed to step 10. When j has reached n - 1 
then the i th iteration has been completed, i.e., the B ( of formula (8.46) is the 
last computed V B when all combinations of j and k have been considered in 
this iteration, and the result of the iteration is D ( . The test of step 11 is then 
made. 

10. The index j is increased to j + 1 (with the associated kasj + 2) to continue 
with the next series of individual transformations, involving the elements of 
the new row, in the particular iteration i. 

11. As a check on the preceding calculations, the trace of the resulting matrix 

is determined to see that it does not vary from the original value, according 
to the property: 


E hj = T( R) = Tm = X 4 

1 j= 1 


If the test is satisfied go to step 13. Otherwise proceed to step 12. 
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12. This is a conditional stop. The magnitude of the allowable error is to be 
determined by the user of the program. Instructions to the computer operator 
should include what to do in the event the stop occurs. The instructions may 
call for certain memory locations to be printed out; or if the user desires, 
it may even include a continuation of the process after a record is made of 
the stop. 

13. Compute DjP ( and P,R, employing the of formula (8.48) for the product 
of all the transformation matrices through the i th iteration. 

14. Another check is provided by postmultiplying both sides of formula (8.49) 
by P l9 noting that P-P £ = I from the orthogonality property of Pi. If each 
element of D t Pi agrees with the corresponding element of P ; R within the 
specified (e 3 ) degree of accuracy, proceed to step 16. Otherwise go to step 15. 

15. This conditional stop is treated the same way as step 12. 

16. The D t and P f are retained in temporary storage, being changed as a result 
of each successive complete iteration. 

17. Count the number (c) of off-diagonal elements d jk that are zero within the 
specified (e t ) degree of accuracy. 

18. If all off-diagonal elements are not zero to the degree go to step 19. Other¬ 
wise the process has converged, so proceed to step 20. 

19. If the number of iterations i has not reached the maximum number pi, go 
back to step 2 to initiate the next iteration. Otherwise go to step 20. 

20. Since the eigenvectors appear in the rows of B, and since the factor matrix A 
conventionally contains the coefficients of the respective factors in Columns, 
the conversion (8.52) for all the common factors may be expressed by : 

A = B'D 12 . 


8.7. Illustrative Examples 

To demonstrate the feasibility of the principal-factor method, now that computers 
are generally available, several examples will be given. 

I. Five socio-economic variables. —First, it should be pointed out that the 
principal-component solution of Table 8.1 was obtained on a relatively slow Philco 
2000 in a matter of a few seconds. If, instead of ones in the diagonal, SMC’s are used 
as estimates of communality* for the reduced correlation matrix the resulting 
eigenvalues become: 

2.73429, 1.71607, .03955, -.02452, -.07261, 

and the solution in terms of two factors is shown in Table 8.13. Of course, since the 
SMC’s are lower bounds for the communalities (see 5.7), the amount of variance 
accounted for by these two principal factors (4.450) is less than the 4.670 of the first 
two principal components. More important, of course, is the fact that the solution 

* Most principal-factor computer programs provide an option for the values to go in the 
diagonal of the correlation matrix, with SMC’s as one of the choices; and the desired values are 
computed as the initial step in the analysis. 
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in Table 8.13 is for a different model than that in Table 8.1. Even if only the first 
two principal components are retained, it doesn’t make the latter solution equivalent 
to the former. The models (2.8) and (2.9) make for a fundamental difference in the 
way the components or the factors are measured (see chapter 16). 


Table 8.13 

Principal-Factor Solution for Five Socio-Economic Variables 
(Communality estimates: SMC’s) 


Variable 

Common Factors 3 

Communality 

P 2 

P 2 

(1) 

Original 

(2) 

Calculated 

(l)-(2) 

1. Total population 

.62533 

.76621 

.96859 

.97811 


2. Median school years 

.71417 

-.55535 

.82227 

.81845 


3. Total employment 

.71414 

.67949 

.96918 

.97170 


4. Misc. profess, services 

.87979 

-.15879 

.78572 

.79924 


5. Median value house 

.74107 

-.57764 

.84701 

.88285 


Contribution of factor (V p ) 

2.73429 




-.05758 

Per cent of original communality 

62.2 




-1.3 


a The P’s in this table and in Table 8.1 are used as generic symbols for principal factors or components, 
the text making it clear which is implied. 


Another recommendation for estimates of communality is that determined from 
all the principal components whose eigenvalues are greater than one. From Table 8.1 
it can be seen that only the first two components have eigenvalues greater than one. 
The “communalities” of the five variables calculated from these components are: 

.98783, .88515, .97930, .88023, .93746. 

If these values are inserted in the diagonal of the correlation matrix, the resulting 
factorization yields the following eigenvalues: 

2.79652, 1.75496, .10642, .02094, -.00888, 

and a solution, again in terms of two factors, is obtained which accounts for slightly 
more of the variance than that shown in Table 8.13. Rather than exhibiting that 
solution, another solution was obtained by iteration—the preceding communalities 
were used initially, then replaced by the communalities obtained from the first two 
factors of the resulting solution, and the process repeated until the communalities 
agreed to three decimal places on successive trials. This was accomplished in 14 
iterations, with resulting communalities: 

1.00000, .76687, .95562, .79837, .97142. 

The principal-factor solution, with these communalities, is shown in Table 8.14. 
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Table 8.14 


Alternate Principal-Factor Solution for Five Socio-Economic Variables 
(Communality estimates: iteration by refactoring) 


Variable 

Common Factors 3 

Communality 

Pt 

Pi 

1. Total population 

2. Median school years 

3. Total employment 

4. Misc. profess, services 

5. Median value house 

Contribution of factor (F p ) 

Per cent of communality 

.622 

.702 

.701 

.882 

.779 

.785 

-.524 

.681 

-.145 

-.604 

1.000 

.767 

.956 

.798 

.971 

2.756 

61.3 

1.740 

38.7 

4.452 




See footnote to Table 8.13. 


While there is considerable similarity between the results in Tables 8 13 and 8 14 
there are also some differences due to the different choices for the communalities.’ 
The differences are more apparent in this small problem than they would be if the 

'° f va 2 n f lGS Were la [ ger; the relative effect ^ the „ diagonal values (out of 
total of n elements involved m the calculations) decreases with an increase in n 

a n fif Vmg ? 1S „ e * ample ’ 11 Sh ° uld be P° inted out that the process employed in 
getting the result of Table 8.14 served as a forerunner for the method of minimum 

residuals (mmres) which is presented in chapter 9. The solution of Table 8.14 mav 
be seen to be very close to the minres solution of Table 9.2. 

2. E'ght emotional traits.— Next, an example of eight variables is presented from 
a field of psychology for which Burt provided the correlations many years ago He 
d.scussed the factorial analysis of emotional traits in another paper [55] notating 

factm e soMon. natUre ° f ‘ he reSulting factors > bm did not actually obtain a principal 

« J h i! COrrelations f ® r thes e traits are given in Table 8.15, including communalities 
ich were estimated from a solution essentially of the bi-factor form. A principal 
actor analysis disclosed the fact that two common factors adequately accounted for 
these communalities. The principal-factor pattern is given in Table 8 16 
The first factor may appropriately be called “General emotionality” (G), although 
the present sample of emotional traits is small. The second factor may be named 
from the traits with significant coefficients. As a rough estimate, the standard error 
of a factor coefficient may be taken from Table B of the Appendix. Thus, for N = 172 
an an average correlation r = .48, the standard error is .064 and any coefficient 
as arge as .20 could certainly be considered significant. The traits with significant 

angerTe infficat?" f^’ W ° nder ^ ^ tenderness Since wonder and 

timTdhv the d f u a " egocentnc P erson ality, and tenderness is indicative of 
ty, the factor characterizing these two opposing emotions may be called 
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Table 8.15 


Intercorrelation of Eight Emotional Variables for 
172 Normal Children Aged Nine to Twelve 


Variable 

1 

2 

3 

4 

5 

1. Sociability 

.94 

— 

— 

— 

— 

2. Sorrow 

.83 

.94 

— 



3. Tenderness 

.81 

.87 

.89 



4. Joy 

5. Wonder 

wm 

.62 

.59 

.63 

.37 

.57 

6. Disgust 

7. Anger 

8. Fear 

.54 

.53 

.24 

.58 

.44 

.45 

.30 

.12 

.33 

.30 

.28 

.29 

.34 

.55 

.19 


6 

7 

8 

— 

— 

— 

— 

— 

— 

.28 

— 

— 

.38 

.63 

— 

.21 

.10 

.12 


Table 8.16 

Principal-Factor Pattern for Eight Emotional Traits 


Variable 

Common Factors 

Communality 


G 

E 

(1) 

Original 

(2) 

Calculated 

(1) —(2) 

1. Sociability 

2. Sorrow 

3. Tenderness 

4. Joy 

5. Wonder 

6. Disgust 

7. Anger 

8. Fear 

Total 

Contribution of factor (F p ) 

Per cent of total original communality 

.98 

.95 

.81 

.72 

.68 

.53 

.52 

.35 

.06 

-.14 

-.51 

-.10 

.32 

.14 

.60 

-.14 

.94 

.94 

.89 

.50 

.57 

.28 

.63 

.12 

.96 

.92 

.92 

.53 

.56 

.30 

.63 

.14 

1 1 1 1 1 

0*0 000000 

— 

— 

4.87 

4.96 

-.09 

4.17 

85.6 

.79 

16.2 

= 

101.8 

-1.8 


“Eeocentricity” (E). If it is desired to change the signs of all the coefficients, then the 
factor may be called “Timidity.” In Burt’s discussion fear and sorrow are classed 
with tenderness, and in the present analysis each of these traits has a coefficient of 
-.14. These values have some statistical significance and help substantiate the 
interpretation of the second factor. 

Before leaving this example, it should be pointed out that an attemp was ma e 
to analyze these data with SMC’s as communalities, but real computing difficulties 
were encountered. The most convenient way of getting the squared multiple correla¬ 
tions, according to (5.38), involves the calculation of the inverse of the correlation 


164 














PRINCIPAL-FACTOR AND RELATED SOLUTIONS 8.7 

matrix (of course, the full matrix with ones in the diagonal). While such a matrix 
must be Gramian (see 3.2, par. 15), the value of the determinant of the correlation 
matrix, produced by the computer, is only .00062. Obviously, the determinant is so 
close to zero that the inverse of the matrix doesn’t exist. When an “inverse” was 
calculated by the computer (it couldn’t tell that the determinant was insignificantly 
different from zero), mbst of the diagonal elements were negative, and the corre¬ 
sponding SMC’s greater than one. Another way of putting it is to say that the rank 
of the full correlation matrix is less than eight, and therefore a case of multicollinearity 
must exist among the eight variables. 

3. Eight political variables. —Another example has been selected from a set of 
political variables in order to illustrate the applicability of the principal-factor solu¬ 
tion in an entirely different field. The data also furnish a solution in which all the 
factors, including the first, are of the bipolar form. A set of eight variables was selected 
from a larger group of seventeen political variables, analyzed by Gosnell and 
Schmidt [162]. The smaller set includes the variables which are among the best 
measures of the factors given in Gosnell’s solution. A brief description of these 
variables, measured in 147 Chicago election areas, follows: 

1. Lewis: Percentage of the total Democratic and Republican vote cast for Lewis 

2. Roosevelt: Corresponding percentage for Roosevelt 

3. Party voting: Percentage that the straight-party votes were of the total 

4. Median rental: Median'rental (in dollars) 

5. Homeownership: Percentage of the total families that own their homes 

6. Unemployment: Percentage unemployed in 1921 of the gainful workers ten years of age 

and over 

7. Mobility: Percentage of total families that have lived less than one year at present address 

8. Education: Percentage of population, eighteen years and older, which completed more 

than ten grades of school 

The intercorrelations of these variables are given in Table 8.17, in which com- 
munalities are also recorded. These communalities were computed by the method 
of 5.4. 

The principal-factor pattern is presented in Table 8.18. It may be observed that 
several variables have large negative coefficients for the first factor, in contrast with 
the positive coefficients found in the preceding solutions. In the present example 
the first factor, being of the bipolar type, may be named from the nature of the 
variables in the subsets (1, 2, 3) and (4, 7, 8). The variables of the first subset may 
be regarded as measures of the “Traditional Democratic Vote” (77) F), which 
is taken as the name of the factor. The variables of the second subset characterize 
the sociological level of the election areas and seem to be opposite in nature to the 
“Traditional Democratic Vote.” The high weight for variable 6 (Unemployment) 
is consistent with the foregoing interpretation inasmuch as high unemployment is 
associated with traditional vote. 

In the case of the second factor, the largest weights appear for variables 5 (Home- 
ownership) and 7 (Mobility), being +.65 and -.56, respectively. Inasmuch as both 
Homeownership and lack of Mobility are aspects of a single characteristic, the second 
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Table 8.17 

Intercorrelations of Eight Political Variables for 147 Election Areas 


Variable 

1 

2 

3 ' 

4 

5 

6 

7 

8 

1. Lewis 

.52 








2. Roosevelt 

.84 

1.00 

— 

_ 

_ 

_ 



3. Party Voting 

.62 

.84 

.78 

— 

_ 

_ 

_ 

_ 

4. Median Rental 

-.53 

-.68 

-.76 

.82 

_ 

_ 

_ 

_ 

5. Homeownership 

.03 

-.05 

.08 

-.25 

.36 

_ 

_ 

_ 

6. Unemployment 

.57 

.76 

.81 

-.80 

.25 

.80 

_ 

_ 

7. Mobility 

-.33 

-.35 

-.51 

.62 

-.72 

-.58 

.63 

_ 

8. Education 

-.63 

-.73 

-.81 

.88 

-.36 

-.84 

.68 

.97 


Table 8.18 

Principal-Factor Solution for Eight Political Variables 3 


Variable 

j 

Common Factors 

Communality 

TDV 

HP 

(1) 

Original 

(2) 

Calculated 

(D-(2) 

1. Lewis 

.69 

-.28 

.52 

.55 

-.03 

2. Roosevelt 

.88 

-.48 

1.00 

1.00 

.00 

3. Party Voting 

.87 

-.17 

.78 

.79 

-.01 

4. Median Rental 

-.88 

-.09 

.82 

.78 


5. Homeownership 

.28 

.65 

.36 

.50 

-.14 

6. Unemployment 

.89 


.80 

.79 

.01 

7. Mobility 

— .66 

-.56 

.63 

.75 

-.12 

8. Education 

-.96 

-.15 

.97 

.94 

.03 

Total 

— 

— 

5.88 


-.22 

Contribution of factor ( V p ) 

mm 





Per cent of total original communality 

Efl 


— 

103.7 

-3.7 


a A principal-factor solution with SMC’s as estimates of the communalities produced two factors with 
contributions of 5.01 and 1.28. 


factor may be termed “Home Permanency” (HP). The negative factor weights for 
the first three variables again appear to verify the naming of this factor. This bipolar 
factor is conveniently described by a single name because the opposing variables 
may be considered as measures on a single scale in opposite directions. 

4. Twenty-four psychological tests. —The final illustration is of a fair-sized 
problem. A problem of this magnitude is commonplace today, but in 1940 it took 
more than 100 hours to calculate the first two factors by hand methods. Therefore, 
it was quite an accomplishment to get a complete factorization on the ORDVAC 
at Aberdeen Proving Grounds in about 40 minutes (in 1952) and on an IBM 704 
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in about 8 minutes (in 1958). By 1965 this time was cut to about 2 minutes on the 
IBM 7094, and since then the time has been slashed to less than one minute on such 
large-scale machines as IBM System/360, CDC 6600, or GE 625. 

For the correlation matrix of Table 7.4, with unities in the diagonal, the resulting 
roots of the characteristic equation are given in Table 8.19. This example demon- 


Table 8.19 

Contributions (Eigenvalues) of 24 Principal Components for 
Twenty-Four Psychological Tests 


Order 

Eigenvalue 

Order 

Eigenvalue 

1 

8.135 

13 

.533 

2 

2.096 

14 

.509 

3 

1.693 

15 

.477 

4 

1.502 

16 

.390 

5 

1.025 

17 

.382 

6 

.943 

18 1 

.340 

7 

.901 

19 

.334 

8 

.816 

20 

.316 

9 

.790 

21 

.297 

10 

.707 

22 

.268 

11 

.639 

23 

.190 

12 

.543 

24 

.172 


strates that property of characteristic equations which says that all roots are real; 
and for the case of a positive semi-definite matrix R, that all roots are non negative. 
While all twenty-four eigenvectors were obtained, for economy of space only the 
first ten principal components are presented in Table 8.20. In addition to, the factor 
weights, the contribution V p (equivalent to the eigenvalue X p ) of factor P p is given 
at the bottom of each column, along with the percentage that this number represents 
out of the total variance of 24. 

The general characteristics of any principal-factor solution are demonstrated by 
the data in Table 8.20. First of all, the contributions of the factors to the total variance 
of the variables (or total communality, when that is being analyzed) decrease with 
each succeeding factor. This immediately provides a rough statistical guide as to the 
maximum error that might be introduced by stopping the analysis too soon. If the 
last factor retained contributes 5 % to the total variance, it is known that the next 
factor, or any succeeding one, will not contribute as much as 5 %. Of course, this 
determination of the effect of each succeeding factor can be judged from the eigen¬ 
values (which are obtained first in the computer) without the actual computation 
of the factor weights. Another characteristic to be noted is that the first factor has 
positive loadings for every variable, since (almost) all correlations are positive in 
the original matrix. For all succeeding factors the positive and negative coefficients 
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Table 8.20 


First Ten Principal Components for Twenty-Four Psychological Tests 


Test 


P 2 

^3 


Ps 

Pe 

Pi 

Ps 

P 9 

Px 0 

1 

.616 

-.005 

.428 

-.204 

-.009 

.070 

.199 

.220 

-.153 

.156 

2 

.400 

-.079 

.400 

-.202 

.348 

.089 

-.506 

-.024 

-.263 

-.256 

3 

.445 

-.191 

.476 

-.106 

-.375 

.329 

-.083 

-.356 

-.060 

.021 

4 

.510 

-.178 

.335 

-.216 

-.010 

-.192 

.461 

.142 

.135 

-.266 

5 

.695 

-.321 

-.335 

-.053 

.079 

.078 

-.123 

.028 

-.232 

-.010 

6 

.690 

-.418 

-.265 

.081 

-.008 

.124 

.001 

.129 

-.054 

-.129 

7 

.677 

-.425 

-.355 

-.072 

-.040 

.011 

.081 

.009 

.001 

-.091 

8 

.694 

-.243 

-.144 

-.116 

-.141 

.119 

.158 

-.172 

.115 

-.065 

9 

.694 

-.451 

-.291 

.080 

-.005 

-.071 

-.009 

.117 

-.124 

.029 

10 

.474 

.542 

-.446 

-.202 

.079 

-.085 

-.013 

-.080 

.079 

-.099 

11 

.576 

.434 

-.210 

.034 

.002 

.301 

-.043 

.320 

-.004 

.062 

12 

.482 

.549 

-.127 

-.340 

.099 

.039 

.158 

-.301 

-.132 

.159 

13 

.618 

.279 

.035 

-.366 

-.075 

.364 

.130 

.175 

-.040 

.104 

14 

.448 

.093 

-.055 

.555 

.156 

.383 

-.084 

-.126 

.262 

.056 

15 

.416 

.142 

.078 

.526 

.306 

-.057 

.126 

.072 

-.304 

.208 

16 

.534 

.091 

.392 

.327 

.171 

.172 

.081 

.128 

.297 

-.212 

17 

.488 

.276 

-.052 

.469 

-.255 

-.107 

.248 

-.214 

-.152 

-.137 

18 

.544 

.386 

.198 

.152 

-.104 

-.252 

-.019 

-.003 

-.344 

-.287 

19 

.475 

.138 

.122 

.193 

-.604 

-.139 

-.341 

.192 

.102 

.104 

20 

.643 

-.186 

.132 

.070 

.285 

-.191 

.026 

-.294 

.176 

.057 

21 

.622 

.232 

.100 

-.202 

.174 

-.226 

-.161 

.176 

.323 

-.000 

22 

.640 

-.146 

.110 

.056 

-.023 

-.331 

-.045 

.131 

.022 

.368 

23 

.712 

-.105 

.150 

-.103 

.064 

-.111 

-.081 

-.248 

.067 

.300 

24 

.673 

.196 

-.233 

-.062 

-.097 

-.170 

-.228 

-.119 

.154 

-.185 

K 

8.137 

2.097 

1.692 

1.501 

1.025 

.943 

.900 

.817 

.790 

.707 

100T p /24 

33.9 

8.7 

7.0 

6.2 

4.3 

3.9 

3.8 1 

3.4 

3.3 

2.9 


are about equal in number. Sometimes a principal-factor solution will serve 
adequately as the final analysis of a correlation matrix, while in other situations it 
will be desirable to transform such a solution to a different frame of reference. 

Psychologists, more often, are concerned with the analysis of the total communality 
rather than the total variance of a given set of variables. Employing the squared 
multiple correlations (SMC’s) as estimates of the communalities, the following 
twenty-four eigenvalues were obtained: 


7.665 

.447 

.254 

.028 

-.104 

-,199 

1.672 

.407 

.175 

-.014 

-.127 

-.235 

1.208 

.319 

.109 

-.048 

-.140 

-.247 

.920 

.305 

.046 

-.066 

-.161 

-.269 


from which it will be seen that thirteen are positive while eleven are negative—a 
circumstance to be expected when estimates of communalities rather than unities 
are inserted in the main diagonal of a correlation matrix. Since multiplication by the 
square root of the eigenvalue is involved in getting the factor weights, eleven of the 
principal factors are imaginary. For practical interpretation this must mean that 
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the number of relevant factors necessary to describe the total communality (as 
estimated) of the twenty-four tests certainly must be less than or equal to thirteen. 
If the analysis is made in terms of all thirteen real factors, the communality (13.555) 
resulting from this solution will exceed the starting communality (11.943). This 
follows from the mathematical property that the contributions of the eleven imaginary 
factors will be negative and will reduce the contributions of the thirteen real factors 
to the actual amount with which the analysis was started. 

The first five eigenvalues account for just about the total starting communality, 
and it might be argued, therefore, that only these factors have any practical sig¬ 
nificance. This conclusion also agrees with the criterion of retaining a number of 
factors equal to the number of principal components whose eigenvalues are greater 
than one (see Table 8.19). It would certainly seem that five factors would provide an 
adequate model for the description of the interrelationships among the twenty-four 
tests. As a matter of fact, four factors do almost as good a job, and the greater 
simplicity might warrant the slight loss in variance. For purposes of reference, all 
five principal factors are'presented in Table 8.21. While psychological interpretations 
might be made of this solution, the more common practice is to transform it to 
another frame of reference, eliminating the bipolar factors and causing the variables 
identifying the factors t6 be more pronounced (see chaps. 14, 15). 

8.8. Canonical Form 

In this section, a general procedure is suggested for designating a coordinate system 
in a given factor space with properties closely related to those of the principal-factor 
method. As is well known (see 2.7), a factor solution for a correlation matrix usually 
produces a unique factor space but not a unique set of common-factor loadings 
(exception being the principal-factor solution). For this reason it is desirable to specify 
a “canonical” form. Of course, rotation to such form has nothing to do with the 
“rotation problem” to obtain a more meaningful solution, which is treated in part iii 
of this text. Rotation to canonical form is merely a means to bring an arbitrary 
solution to a well-defined form in a mathematical sense. Among other values, it may 
be useful in resolving the question of the meaning of equivalence of two solutions— 
they may look different, but if they are truly equivalent, then, when each is brought 
to canonical form, they will be identical. 

The canonical form employed in this text (especially in chaps. 9 and id) has the 
property that successive factors account for maximum possible variance in the 
common-factor space determined by the original solution. This should not be 
confused with the alternative objective set forth in 2.3 of determining a solution, 
i.e., the common-factor space, with the property of extracting the maximum variance 
from the observed variables which has been the primary subject of this chapter. 

The canonical form is arrived at in the following manner. Let 

A = arbitrary form of factor matrix (n x m), 

B = canonical form of factor matrix (n x m), 

T = orthogonal transformation matrix (m x m); 
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Table 8.21 

Principal-Factor Solution for Twenty-Four Psychological Tests 
(Communality estimates: SMC’s) 


Test 

Common Factors 

Communality 

Px 

P 2 

P 3 

Pa 

Ps 

Original 

Calculated 

1 

.595 

-.039 

.369 

-.184 

-.073 

.511 

.531 

2 

.374 

.026 

210 

-.147 

.121 

.300 

.250 

3 

.433 

.115 

.396 

-.112 

-.276 

.440 

.446 

4 

.501 

.108 

.290 

-.178 

.044 

.409 

.380 

5 

.701 

.312 

-.273 

-.050 • 

.003 

.673 

.666 

6 

.683 

.404 

-.213 

.067 

-.102 

.677 

.690 

7 

.676 

.412 

-.284 

-.082 

-.046 

.684 

.716 

8 

.680 

.206 

-.088 

-.115 

-.118 

.564 

.540 

9 

.690 

.446 

-.212 

.076 

-.036 

.713 

.727 

10 

.456 

-.469 

-.446 

-.128 

.105 

.579 

.654 

11 

.589 

-.372 

-.198 

.076 

-.185 

.541 

.565 

12 

.448 

-.491 

-.154 

-.263 

.044 

.537 

.537 

13 

.590 

-.268 

.018 

-.300 

-.255 

.539 

.575 

14 

.435 

-.063 

-.012 

.418 

-.057 

.358 

.371 

15 

.390 

-.102 

.055 

.362 

.101 

.293 

.307 

16 

.512 

-.098 

.325 

.259 

.006 

.429 

.444 

17 

.471 

-.212 

-.036 

.388 

-.087 

.412 

.426 

18 

.521 

-.331 

.118 

.145 

.028 

.443 

.417 

19 

.450 

-.115 

.110 

.167 

-.178 

.367 

.287 

20 

.623 

.135 

.142 

.049 

.252 

.464 

.492 

21 

.596 





.473 

.471 

22 

.600 

.103 

.138 

.053 

.142 

.449 

.413 

23 

.685 




.154 

.561 

.532 

24 

.635 

-.169 

-.192 

sa 


.527 

.475 

V p 

7.665 

1.672 

1.208 

.920 

.447 

11.943 

11.912 

100 VJ\ 1.943 

64.2 


10.1 

7.7 

3.7 

100. 

99.7 


then 

(8.53) B = AT 

will yield the desired form of the factor solution, and the immediate problem is to 
determine the matrix T. Premultiplying (8.53) by the transpose of B produces 

(8.54) B'B = T'A'AT. 

Then, setting A = B'B and pre- and postmultiplying by T and T', respectively, 
yields 

(8.55) TAT' = TT'A'ATT', 
which finally reduces to 

(8.56) 
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T 1 " = 1 from the orthogonality property of the transformation matrix (see 
4.6). The expression (8.56) is a special instance of (8.21) for the diagonalization of 

it becom” 1C matnX R ' By Pre " and postmult ‘P 1 y in 8 (8-21) by Q and Q', respectively, 


(8.57) 


R = QAQ'. 


The similarity to (8.56) is obvious. It follows that A is a diagonal matrix of eigen¬ 
values and the orthogonal transformation T is the matrix of corresponding eigen¬ 
vectors of the mat™ A'A. Thus, it is only necessary to determine the eigenvectors 
an m x m matrix to obtain the transformation matrix which carries the arbitrarv 
pattern matrix A into the canonical form B. The general prindpaMa^S 
routines are applicable to this problem. F mputer 

Of course, the matrix of residuals remains unchanged whether computed from 

A or B since, according to (2.50), F m 


(8.58) 


R f = BB' = ATT'A' = AA', 


the last equality following from the orthogonality of the transformation matrix For 

r y XXYh“ Can ° niCal f ° rm ° f thG SOlUti ° n " ° btained ’ * is a * ain de ^nate" 

8.9. Centroid Method 

This method of factoring a correlation matrix provided a computational com¬ 
promise for the principal-factor method before computers were generally availab” 

of°fhe the < f nt ? ld ™ etbod IS prlmanI y Of historical interest. The fundamental formula 
of the centroid method was first employed by Burt [51, p. 53] in 1917, but applied to 
the problem of determining a single general factor of the Spearman type. The complete 

oC™ batteries T r'r d ThUrSt0ne [467] in ‘"-"with the 
i large batteries of psychological tests into several common factors. 

Shin °“ 0f nle ‘ hod -~ The name Of the method connotes its close relation- 
ip to the mechanical concept of a centroid, or center of gravity. For this reason 
the centro'd form of analysis can best be described in geometric terms. As noted in 

com * e , Vanables ma y be considered as represented by a set of n vectors which are 
contained in a space of m dimensions, where m is the number of common factors • 

mven t 7 m rf* ^ f ” * delation betwT^ 

given in (4.56). The variables may also be considered as represented bv the m cn 

ordma.es of the end points of these vectors with respect to m mutually orthogonal 
trary reference axes. Since the configuration of the vectors representing the 
variables completely determines the correlations, the reference system may be 
rotated without any effect on them. The arbitrary coordinate system ZZ be 
rotated so that the centroid point of the set of „ points, along with the deter 

offte^torToZe'lo^r TZ* is . p0ssible t0 obtain the Projection if each 
the centroid coordinate ofeach variable, on the first axis of reference through 
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Starting with a factor pattern of the usual form (2.22), the correlations are repro¬ 
duced by means of equation (2.27) when the common factors are uncorrelated. Then 
making the assumption that the residuals vanish, the observed correlations may be 

written 


r ik = r. 


tin a kl + a i 2 a k 2 + • • • + dj m a km 


( 8 . 59 ) r jk = r' jk = a n a kX + a j2 a k2 + • • • + UjmUkm 0 ’> ^ 2 ’ ’ 

where m is the number of common factors. The numerical values of a ip (p = 1, X _'. m) 
are determined by the position of the orthogonal reference axes, since a jp is the p 
coordinate of variable z,.* In the arbitrary orthogonal reference system, the m co- 
ordinates for each of the n points are as follows: 

a i2 , • ■ •, cix p , ■ • ‘, ^itn) 


Pk-( a kl’ a k2’ ■ ■ ‘ ’ a kp> ' ■ ■ ’ a km ) 


P n '-( a nli a n2> ‘ ’ a np ’ ' ' ' ’ a nm) 

Any one of the m coordinates of the centroid is the average of the corresponding 
coordinates of these n points, hence 


(8.60) 


Centroid: ( - £ a ki , X a k P > * * * > n X ?*«) ’ 


where the summation is from 1 to n on the indicated index when the limits are not 

81 Now^tthe frame of reference be so selected that the first axis F x P a ^ es thro ^ h 
the centroid. Then the centroid will have coordinates all zero except the first, i.e., 


(8.61) 


X a k2 ~~ X a k3 ' ‘ ' X 1 


l km 


o. 


The m values (8.60) then reduce to 

1 


X a *i>0,0>***> 0 ’ 


there being (m - 1) zeros. Since the centroid lies in the first axis, the first coordinate 

is also the distance of the centroid from the origin. . . * 

It is now possible to determine the coefficients of the first centroid factor, i.e., t 
coordinates in terms of the observed correlations. Thus, summingfor all variables 
k in a fixed column j of the correlation matrix, there results 

X r jk = a n[ X a kl\ + a J2 (X a k2\ + • • • + a J*(E a km|> 


* More precisely, this variable should be designated by Zj since it is represented in the common 
factor space. See 4.10. 
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which, on applying (8.61), becomes 
(8-62) ZOfc = fl jl |£ ^li¬ 

Then the sum of all the entries in the correlation matrix is simply 

( 8 - 63 ) ?? 0fe = = (? afei ) 2. 

Now, the square-root of the left side of (8.63) may be substituted for the term in 
parentheses in (8.62) to yield the following basic centroid formula: 

l rjk c. 

( 8 -64) a n = - • = —p= (j = 1,2,•••, n), 


where Sj is the sum of all the correlations in column j of the correlation matrix, and 
T is the total of all the correlations in the matrix, including the diagonal terms in 
both of these sums. In the above formula, the positive root was selected arbitrarily. 
Of course, if the negative sign of the radical had been chosen, the coefficients of the 
factor would all be changed in sign, yielding an equally acceptable factor. Formula 
(8.64) gives the coefficient of the first centroid factor F x for each variable z j} or the 
first coordinate for each point representing a variable. 

The next step is to get the first-factor residuals, from which the second coordinates 
are found. Since the residual correlations with one, two, • • •, (m — 1) factors removed 
are employed in successive stages of the centroid method, the notation introduced 
in 8.3 is employed again. The first-factor residuals again are given by 

(8.24) 1 ?jk Yjk ®jl@kl ^jl^kl "h ^j'3^fe3 "h " ’ ' "h ®jrrflkrrr 

The residual correlations may be regarded as the scalar products of pairs of residual 
vectors in a space of (m — 1) dimensions—the dimension of the residual space being 
equal to the number of terms in the right-hand member of (8.24) or the rank of the 
matrix of residual correlations, according to Theorem 4.6. 

In this residual space, the (m — 1) coordinates for each of the n points may be 
designated by: 

1^1 : ( a 12> a 13> ‘ ‘ * j a lp’ *' ‘ j a lm) 


9 9 9 9 9 


{ttk2i ®k3> ’ ’ j i ®km) 


x P n .((J„2, i Q-npi ’ ‘ ‘ » ®nm)‘ 

The (m — 1) coordinates of the centroid of these n points are 
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which all vanish according to (8.61). Thus the centroid is at the origin in this (m — 1) 
space, and formula (8.64) cannot be used directly for calculating the values of the 
second-factor coefficients. It will be noted that in obtaining (8.64) the expression 
(8.62) was divided by £ a kl , or n times the distance of the centroid from the origin. 

k 

It was tacitly assumed that the centroid was not at the origin, for otherwise this 
division would not have been possible. 

The immediate problem, then, is to remove the centroid from the origin in the 
(m — 1) space, so that the preceding method can again be applied. By means of 
rotations of certain of the vectors about the origin through 180°—also called reflec¬ 
tions in the origin —the centroid can be removed from the origin. If the coordinates 
of a point Pj, representing a variable Zj, are 

(®j 1, ^j'2? ' ' ’j ®jm)’ 

then the reflected point — P } , with coordinates 

( tlji , ^j2» ’ ‘ ‘ » ®jm)i 

represents the variable — Zj. Such a variable corresponds to the original variable 
measured in the opposite direction. 

Now it is evident from (8.59) that to reverse the signs of the coordinates of Pj 
has the effect of reversing the signs of all the correlations of variable Zj with the 
remaining ones. Thus the reflection of a variable in the origin is accomplished merely 
by changing the signs of the correlations of this variable in the correlation matrix. 
Of course, the same argument holds for the residual (m — 1) space as for the original 
common-factor space of m dimensions. Hence the reflection of a variable in the 
residual space is accomplished by changing the signs of the residual correlations 
for that variable. 

In an attempt to determine which variables to reflect, Thurstone [468, p. 96] 
suggests that “it is desirable to account for as much as possible of the residual variance 
by each successive factor.” While a rigorous mathematical application of this 
principle would lead to the principal-factor solution, the centroid procedure is only 
intended to approximate it. To have the second factor account for as much as possible 
of the residual variance, the reference axis which represents this factor should pass 
through a cluster of residual vectors. If there is a clustering of such vectors (i.e., a 
group of variables having high positive residual correlations), which is balanced by 
a scattering of vectors on the opposite side of the origin, since the centroid is at the 
origin, then the second reference axis should be made to pass through this cluster. 
Thus it would seem that the vectors which scatter, opposite to a cluster, should be 
reflected so as to fall in with the group. The second centroid coordinates can then be 
computed as in the first case. In application, those variables which have the greatest 
number of negative correlations would be reflected first, bringing them into the 
hemisphere of the cluster. For practical problems, Thurstone [468, p. 97] suggests 
reversing “the signs of one trait at a time until the number of negative coefficients 
in the residual table is less than n/2”, that is, less than n/2 negative signs for any 
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one variable, not the entire table. One need not stop with this, however, for, if it is 
desired to further “maximize” the variance removed by each successive factor, the 
reflections of variables may be continued until the sum of the residual correlations 
for each variable is as large (positively) as possible. 

For the remainder of the analytical work it will be convenient to use a symbol to 
designate whether a point, representing a variable, has been reflected in the origin. 
Let €j stand for the algebraic sign of P } , that is, the point is +Pj or -Pj. If P- 
has not been reflected, €j is plus, but, if P, has been reflected, then €j is minus! Further¬ 
more, €j may be considered as an algebraic operator defined as follows: 


+1 if Zj has not been reflected, 
— 1 if Zj has been reflected. 


Then e,. can be attached to the coordinates of Pj, and it can be treated as any other 
algebraic quantity. Thus, if the first-factor residual correlations after reflection of 
certain variables are designated by r Xjk (in distinction to x r jk before reflection), then 
they may be written as follows: 


(8-66) r Xjk — €j€ k (a j2 a k 2 + a j3 a k3 + • • • + a jm a km ). 

This result follows immediately from (8.24), where each a jp and a kp was replaced by 
€j a Jp and e k a kp , respectively, and then e, and e k were factored out algebraically. If 
neither Zj nor z k was reflected, or if both variables were reflected, then r Xjk = x r jk ; 
but, if only one or the other of Zj and z k was reflected, then r Xjk = — x r jk . In other 
words, r XJk = €je k ( x r jk ). 

The (m - 1) coordinates of the centroid of the n points, originally given in (8.65), 
become after reflection of variables: 

1 „ 

- L c fe a fep (P = 2,3, • • •, m). 

n k 

Now the system of reference can be rotated about the first axis F x so that the second 
axis F 2 passes through this centroid*. Let it be assumed that this has been done, 
there being no need to change the notation for the coordinates. Then the coordinates 
of the centroid are 

“ X e k a k2-> 0, 0, • • • , 0, 

n k 

since the centroid lies on the F 2 axis. From the values for the last (m — 2) coordinates 
the following useful expressions may be written : 

X! e k a k3 = X c k a k4 = * ' • = X e k a km — O’ 

k k k 

corresponding to (8.61) in the case of the first centroid. 

* The residual {m - 1) space is orthogonal to the first axis of reference F x . The second axis F 2 , 
i.e., the first one in the (m -r 1) subspace, may then be rotated to any position in the residual 
space, and it will be at right angles to F x . 
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Now the projections of the vectors on the second centroid axis can be expressed 
in terms of the residual correlations. Thus, summing (8.66) for all variables k in a 
fixed column j of the residual correlation matrix (after reflection of variables) and 
applying (8.67), there results 

Z r ljfe = e j a j2 Z e k a k2- 

k k 

Then, summing for all columns, 

Z Z r Uk = Z e J a j2 Z e k<*k2 = ( Z 2 ) 2 - 

j k j k \ k I 

It follows that 



or, multiplying both sides by e,-, 

( 8 . 68 ) 


a j 2 — 


y/% 


(j 1 » 2 , •• •, ti). 


where S n is the sum of all the entries in column j of the matrix of first-factor residual 
correlations and T\ is the total of all the correlations in this matrix, the signs of all 
the entries being those after reflection. The e, indicates that, if the variable zj was 
reflected, then the algebraic sign must be changed, but, if the variable was not 
reflected, then cj is merely +1. In other words, cjS n is the sum of all residual correla¬ 
tions for the unreflected variable Zj with all other variables. Hence, by defining 


(8.69) 1 Sj = €jS Jlt 

formula (8.68) may be put in the form 


(8.70) 


a J2 ~ 



(j = 1,2, • • •, n). 


In this formula the numerator refers to the sum of the residual correlations for the 
unreflected variable j , while in the denominator T x still stands for the sum of all 
residual correlations after the sign changes. Formula (8.70) gives the coefficients of 
the second centroid factor F 2 for each variable. 

The remaining factor weights can be obtained in a similar manner. To follow the 
basic principle of accounting for as much as possible of the variances of the variables 
by each factor, the variables are reflected in the residual subspaces to bring them into 
clusters. When the sign changes have been made, the centroid of the system of points 
in the residual space lies somewhere in the cluster of variables, and the next reference 
axis is selected through this centroid. Each successive centroid axis is at right angles 
to every one of the preceding axes because the residual space is orthogonal to the 
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space of the centroid axes already established. Upon extracting each centroid factor, 
the residual correlations are reduced in magnitude, and the rank of eacfi residual 
matrix is reduced by one, theoretically. The foregoing development was made 
without any restrictions on the diagonal elements. The number of factors ultimately 
obtained is dependent upon these diagonal values, leading to the question of “when 
to stop factoring.” 

If the simplest of the arbitrary estimates of communality (see 5.6)— the largest 
correlation in each column of the correlation matrix—is selected for the diagonal 
element, then in each subsequent residual matrix the calculated diagonal term is 
not retained but replaced by the largest residual correlation, regardless of sign, 
in each column. This procedure does not furnish a standard for determining the 
number of common factors. 

If, instead of modifying the diagonal entries at each stage, the analysis is applied 
directly, keeping the diagonal terms as calculated, then the number of factors is 
determined when the original diagonal values have been completely resolved. Only 
the common factors will be obtained if the correlation matrix contains communalities 
in the principal diagonal and these values are completely analyzed. In practice, it is 
recommended that appropriate complete estimates of the communalities, discussed 
in 5.7, be employed. When such estimates are used and the straightforward centroid 
analysis is applied, the resulting solution terminates in a definite number of common 
factors. 

2. Computing procedures. —When a high-speed electronic computer is available, 
there is no need to accept a substitute for the principal-factor solution. Without such 
facilities, however, a reasonable compromise is the centroid method using a desk 
calculator. The calculation of a centroid solution is demonstrated below, using the 
first thirteen of the twenty-four psychological tests introduced in 7.6. The correla¬ 
tions, taken from Table 7.4, are recorded below the diagonal in Table 8.22. The 
communality estimates in the diagonal were available from a bi-factor solution of 
the thirteen tests. 

To get the first-factor coefficients by (8.64), the sums Sj of correlations by columns 
and the total T of all the correlations in the matrix R are required. The column sums 
are shown at the bottom of Table 8.22, and are obtained by adding all the numbers 
in a particular row out to the diagonal and then down the corresponding column. 
The total T = 57.314 is obtained simply as the sum of all the Sj. Then, dividing Sj by 
yT = 7.5706 produces the coefficients of the first centroid factor shown in the last 
line of Table 8.22. A check on the computation is available by making use of the 
sum of the factor coefficients, denoted by D u as follows: 

(8-71) = E Oji = I SJJt = T/Jf = yr. 

Since D x = 7.571 the computational accuracy is substantiated. 

The procedure for getting the first-factor residuals in the centroid method is 
precisely the same as that outlined in 8.5, paragraph 4 above. The product matrix R\ 
for the data is given in Table 8.23. 


177 




8.9 DIRECT SOLUTIONS 


Table 8.22 

Calculation of the C x Coefficients from the Correlation Matrix R 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

1 

.558 

_ 

_ 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

2 

.318 

.203 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 


3 

.403 

.317 

.362 

— 

— 

— 

— 

— 

— 

— 

— 

— 


4 

.468 

.230 

.305 

.314 

— 

— 

— 

— 

— 

— 

— 



5 

.321 

.285 

.247 

.227 

.646 

— 

— 

— 

— 

— 

— 

— 


6 

.335 

.234 

.268 

.327 

.622 

.641 

— 

— 

— 

— 

— 

— 

— 

7 

.304 

.157 

.223 

.335 

.656 

.722 

.750 

— 

— 

— 

— 

— 

— 

8 

.332 

.157 

.382 

.391 

.578 

.527 

.619 

.571 

— 

— 

— 

— 

— 

9 

.326 

.195 

.184 

.325 

.723 

.714 

.685 

.532 

.758 

— 

— 

— 

— 

10 

.116 

.057 

.075 

.099 

.311 

.203 

.246 

.285 

.170 

.554 

— 

— 

— 

11 

.308 

.150 

.091 

.110 

.344 

.353 

.232 

.300 

.280 

.484 

.449 

— 

— 

12 

.314 

.145 

.140 

.160 

.215 

.095 

.181 

.271 

.113 

.585 

.428 

.531 

— 

13 

.489 

.239 

.321 

.327 

.344 

.309 

.345 

.395 

.280 

.408 

.535 

.512 

.599 

S, 

4.592 

2.687 

3.168 

3.618 

5.519 

5.350 

5.455 

5.340 

5.285 

3.443 

4.064 

3.690 

5.103 

a n 

.607 

.355 

.418 

.478 

.729 

.707 

.721 

.705 

.698 

.455 

.537 

.487 

.674 


The elements of this matrix are checked by means of the following relationship 
between the sums of complete rows E jX and D x : 

(8.72) Ej i = £ a jx a kx = a jX £ a kx = a jX D x (j = 1,2, • • •, n ). 

fe k 

This check is shown for each of the thirteen variables in Table 8.23. 


Table 8.23 


Product Matrix: Rl = (a jx a kx ) 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

1 

.368 

_ 

_ 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

2 

.215 

.126 

— 

— 

— 

— 

— 

— 

— 

— 




3 

.254 

.148 

.175 

— 

— 

— 

— 

— 

— 

— 




4 

.290 

.170 

.200 

.228 

— 

— 

— 

— 

— 

— 




5 

.443 

.259 

.305 

.348 

.531 

— 

— 

— 

— 

— 

— 



6 

.429 

.251 

.296 

.338 

.515 

.500 

— 

— 

— 

— 

— 

— 


7 

.438 

.256 

.301 

.345 

.526 

.510 

.520 

— 

— 

— 

— 

— 

— 

8 

.428 

.250 

.295 

.337 

.514 

.498 

.508 

.497 

— 

— 

— 

— 


9 

.424 

.248 

.292 

.334 

.509 

.493 

.503 

.492 

.487 

— 

— 

— 

— 

10 

.276 

.162 

.190 

.217 

.332 

.322 

.328 

.321 

.318 

.207 

— 

— 

— 

11 

.326 

.191 

.224 

.257 

.391 

.380 

.387 

.379 

.375 

.244 

.288 

— 

— 

12 

.296 

.173 

.204 

.233 

.355 

.344 

.351 

.343 

.340 

.222 

.262 

.237 

— 

13 

.409 

.239 

.282 

.322 

.491 

.477 

.486 

.475 

.470 

.307 

.362 

.328 

.454 


4.596 

2.688 

3.166 

3.619 

5.519 

5.353 

5.459 

5.337 

5.285 

3.446 

4.066 

3.688 

5.102 

a n D x 

4.596 

2.688 

3.165 

3.619 

5.519 

5.353 

5.459 

5.338 

5.285 

3.445 

4.066 

3.687 

5.103 
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The first-factor residuals are shown in Table 8.24. In this residual-factor space the 
centroid must be at the origin, and hence the column sums should be zero except 
for rounding errors. To remove the centroid from the origin in the residual-factor 
space, and to increase the contribution of the second factor to the residual variance, 
certain variables are reflected in the origin. The variables to be reflected are deter¬ 
mined as in Table 8.25. In the column headed “Reflected Variable,” a minus sign 
is placed opposite any variable which is to be reflected. The remaining columns are 
introduced successively as variables are reflected. The process is begun by counting 
the number of negative signs for each variable in the residual matrix of Table 8.24 
and recording in the column headed “Before Reflection.” It should be noted that, 
although only half the symmetric residual matrix is recorded, the number of negative 
signs to be considered for each variable is that of the total matrix. In other words, 
when counting the number of negative signs read across the row and down the 
column for a specified variable. 

Select the variable with the largest number of negative signs to be reflected first. 
If several variables have the same maximum number of negative signs, any one of 
them may be arbitrarily selected for reflection. In the example, variables 10 and 11 
each have 9 negative signs, and z 10 is selected for reflection. Opposite variable 10 
in the first column of Table 8.25 place a minus sign to indicate that this variable is 
to be reflected. An adjustment in the number of negative signs for each variable is 
made as if variable 10 were reflected in Table 8.24 (i.e., as if all the signs for variable 10 
were changed); and these results are recorded in the column headed “10” to indicate 
that the count of negative signs for each variable is that after variable 10 is reflected. 

Upon reflection of a given variable, every residual which was positive becomes 
negative and every negative residual becomes positive, except that the value in the 
diagonal of the residual matrix remains unchanged. Therefore, for the variable being 
reflected the adjusted number of negative signs is (n — 1) minus the number of negative 
signs it had before reflection. In the example, n — 1 = 12 and the entry for variable 10 
after reflection is 12 — 9 = 3. 

It is not necessary to change all the signs of the residuals for the variable being 
reflected in order to count the number of negative signs for the other variables 
after the reflection. Instead, consider the sign of each entry except the diagonal in 
the row and column of Table 8.24 for the variable being reflected, and proceed as 
follows. 

(a) If the entry for a particular variable, which was not previously reflected, is 
positive, increase by one the number of negative signs for that variable and record 
the new value for that variable in the next column of Table 8.25. For example, the 
entry for variable 11 in column 10 of Table 8.24 is positive, and since z xl was not 
previously reflected, the number of negative signs for it is increased one, from 9 to 10, 
in Table 8.25 after variable 10 is reflected. 

(b) If the entry for a particular variable, which was not previously reflected, is 
negative, decrease by one the number of negative signs for that variable and record 
the new value for that variable in the next column of Table 8.25. For example, the 
entry for variable 1 in row 10 of Table 8.24 is negative, and since z x has not been 
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Table 8.24 

First-Factor Residuals: x r jk 


Variable 

1 

2 ! 
_1 

b 

b 

m 

6 

B 

B 

9 

10 

11 

12 

13 

1 

.190 


mm 

B 

B 

_ 

B 


B 





2 

.103 

.077 




_ 




_ 

_ 

_ 

_ 

3 

.149 

.169 

.187 



— 

■■■ 



_ 

_ 

_ 

_ 

4 

.178 

.060 

.105 

.086 


— 





_ 

_ 

_ 

5 

-.122 

.026 

-.058 

-.121 

.115 

— 




_ 

_ 

_ 

_ 

6 

-.094 

-.017 

-.028 

-.011 

.107 

.141 

— 



_ 

_ 

_ 

_ 

7 

-.134 

-.099 

-.078 

-.010 

.130 

.212 

.230 

_ 


_ 

_ 

_ 

_ 

8 

-.096 

-.093 

.087 

.054 

.064 

.029 

.111 

.074 


_ 

_ 

_ 

_ 

9 

-.098 

-.053 

-.108 

-.009 

.214 

.221 

.182 

.040 

.271 

_ 

_ 

_ 

_ 

10 

-.160 

-.105 

-.265 

-.118 

-.021 

-.119 

-.082 

-.036 

-.148 

.347 

_ 

_ 

_ 

11 

-.018 

-.041 

-.133 

-.147 

-.047 

-.027 

-.155 

-.079 

-.095 


.161 

_ 

_ 

12 

.018 

-.028 

-.064 

-.073 

-.140 

-.249 

-.170 

-.072 

-.227 

.363 

.166 

.294 

_ 

13 

.080 

.000 

.039 

.005 

-.147 


-.141 

-.080 

-.190 


.173 

.184 

.145 

I s* 

k 

-.004 

-.001 

.002 

-.001 

.000 


-.004 

.003 

.000 

-.003 

-.002 

.002 

.001 


Table 8.24a 
Sign changes 3 



* Original signs below diagonal; signs after reflection of variables above diagonal. 

Table 8.24b 

Residuals after Reflection ( r xjk ) and Calculation of C 2 Coefficients 


Variable 

-1 

2 

3 

4 

B 

6 

7 

8 

9 

-10 

-11 

-12 

-13 

1 

.190 

-.103 

-.149 

-.178 

.122 

.094 

.134 

.096 

.098 

-.160 

-.018 

.018 

mm 

2 

— 

.077 

.169 


.026 

-.017 

-.099 

-.093 

-.053 

.105 

.041 

.028 

Hlltllll 

3 

— 

— 

.187 

.105 

-.058 

-.028 

-.078 

.087 

-.108 

.265 

.133 

.064 

HFiwi 

4 

— 

— 

— 

.086 

-.121 

-.011 

-.010 

.054 

-.009 

.118 

.147 

.073 

-.005 

5 

— 

— 

— 

— 

.115 

.107 

.130 

.064 

.214 

.021 

.047 

.140 

.147 

6 

— 

— 

— 

— 


.141 

.212 

.029 

.221 

.119 

.027 

.249 

.168 

7 

— 

— 

— 

— 



.230 

.111 

.182 

.082 

.155 

.170 

.141 

8 

— 

— 

— 

— 


^^B 


.074 

.040 

.036 

.079 

.072 

.080 

9 

— 

— 

— 

— 

^^B 



— 

.271 

.148 

.095 

.227 

.190 

10 

— 

— 

— 

— 






.347 

.240 

.363 

.101 

11 

— 

— 

— 

— 






— 

.161 

.166 

.173 

12 

13 

— 

— 

— 


■ 

■ 

■ 

■ 

■ 

1 1 

— 

.294 

.184 

.145 


.224 

.141 

.550 

.309 

.954 

1.311 


.729 

1.516 

1.785 

1.446 

2.048 

1.365 


-.224 

.141 

.550 

.309 

.954 

1.311 


.729 

1.516 

-1.785 

-1.446 

-2.048 

-1.365 

a J2 

-.060 

.038 

.148 

.083 

.257 

.354 

.367 

.197 

.409 

-.482 

-.390 

-.553 

-.368 
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PRINCIPAL-FACTOR AND RELATED SOLUTIONS 8.9 


Table 8.25 

Number of Minus Signs for First-Factor Residuals after Successive 
Reflections of Variables 


Variable 

Reflected 

Before 

After Reflection of Successive Variable 


Variable 

Reflection 

10 

11 

12 

13 

1 

1 

2 

3 

4 

5 


7 

7 

7 

7 

7 

6 

6 

6 

6 

6 

5 

5 

5 

5 

5 

6 

4 

4 

4 

4 

7 

5 

5 

5 

3 

5 

6 

6 

6 

2 

6 

7 

8 

9 

10 

11 

12 

13 


8 

7 

6 

5 

4 

3 

- 

8 

6 

8 

9 

9 

8 

5 

7 

5 

7 

3 

10 

9 

6 

6 

4 

6 

2 

2 

10 

7 

5 

3 

5 

1 

1 

2 

8 

4 

2 

4 

0 

0 

1 

4 

3 

1 

3 

1 

1 

0 

3 

Total 

Difference 


96 

84 

12 

68 

16 

52 

16 

44 

8 

40 

4 


reflected, the number of negative signs for it is decreased one, producing 6 negative 
values after variable 10 is reflected. 

General rules for sign changes are formulated conveniently in Table 8.26. 

After one variable has been reflected, proceed to the next column of Table 8.25 
and again select the variable with the largest number of negative signs for reflection. 
This is variable 11, with 10 negative signs, and the next column is headed “11”. 
Adjust the number of negative signs for each variable as if z x x were reflected in Table 
8.24, following the procedure outlined above, and record the results in Table 8.25. 
The reflection of variables is continued until the number of negative residuals for each 
variable is less than half the total number. In the example n = 13 so that the reflections 
are carried to the point where there are six or fewer negative signs for each variable. 
This is accomplished after five variables have been reflected. 

Several exceptional situations should be noted. If zero values should appear in 
any of the correlation or residual tables, they may be treated as positive numbers 
in making sign adjustments for the reflection of variables. The diagonal values of the 
residual tables are not considered in the count of negative signs, for, if a variable is 

r f eC u te u d, i tS /‘ Se ! f " COrrelation ” remains unchan ged. It may happen that a variable 
which had already been reflected may again appear as the variable with a maximum 
number of negative signs after several other variables have been reflected. In this 

x S ui th oV ariable iS reflected a § ain ’ cha nging the minus to plus in the first column of 
lable 8.25, and the number of minus signs is adjusted for each of the variables. 

The variables having minus signs in the first column of Table 8.25 now may 
actually be reflected in Table 8.24. In order to make the procedure perfectly clear, the 
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Table 8.26 


Rules for Sign-Change Adjustments 


Previous Reflections 

Entry in Row (or Column) 
of Reflected Variable Is 
Positive 

Entry in Row (or Column) 
of Reflected Variable Is 
Negative 

Not previously reflected (or reflected an 
even number of times) 

Increase one 

Decrease one 

Previously reflected once (or any odd 

Decrease one 

Increase one 

number of times) 




additional Tables 8.24a and 8.24b are provided* First place a minus sign before the 
column number of each variable which is to be reflected, i.e., before variables 1, 10, 
11,12, and 13 in Table 8.24a. Then the signs of the original first-factor residuals x r jk 
may be changed to obtain the proper algebraic signs of the residuals after reflection 

of variables, i.e., 

(8.73) r ijk = 

Since the epsilons are merely algebraic symbols for the plus or minus signs, if neither 
z- nor z k was reflected, or if both variables were reflected, then r-ij k i ^jk> 

only one or the other of Zj and z k was reflected, then r ljk = -1 r jk . First go through 
the upper half of Table 8.24 (recorded separately as 8.24a for instructional purposes 
only), one row at a time, and insert minus signs according to the above rules. A 
convenient procedure is to look at each entry of the first column, note the adjusted 
sign, and when this sign should be minus, record a minus sign in the corresponding 
cell of the first row of the upper half of the table. Check the total number of minus 
signs for the first variable with the number given for that variable in Table 8.25. 
Then proceed to the second column of the lower half of the table, note the sign changes, 
and record the minus signs in the second row of the upper half of the table. The count 
of six minus signs for variable 2, in the second column and second row of the upper 
half of Table 8.24a, agrees with that given in the last column of Table 8.25. Continue 
this process of sign changes for every variable in Table 8.24a. As an additional check, 
the total number of minus signs in the upper half of Table 8.24a must be equal to 

one-half of the total given in the last column of Table 8.25. - 

Now, merely copy the values (without any algebraic signs) from the columns ot 
the lower half of Table 8.24 into the corresponding rows of the upper half of Table 
8.24 (actually done in Table 8.24b for the example). The values so obtained are the 
residuals of the reflected variables. 


* In practice, however, these additional tables may be obviated by incorporating them in 
Table 8.24. That procedure is indicated by Table 8.27, in which the second factor residuals are 

shown. 
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direct solutions 


reflection (in the upper half ofTable 8.24)._Ai reflected. The coeffi- 

algebraic signs are or those va^abb tha^ ^ the reflected 

cients in the last line of Table 8.24b are fo facto rs after (he first ls 

ones. A check on the computation ^ be approximately zero for 

^ coefficients add to 

—»• 

of Table 8.27. To remove the centroid from 8^ ^ ^ reflecte d determined m 

changes are carried out as a ove ’ hown i n the upper triangle ofTable 8.27, 

Table 8 . 28 . After reflection the residuals a these values. Then the third- 

and the third-factor coefficients are ca cu foregoing procedure 

« " d to account for ; h ; 

total communality. *’ ^ 1 


Table 8.28 

Number of Minus Signs for Second-Factor Residuals after Successive 



184 































Table 8.30 

Centroid Solution for Thirteen Psychological Tests 


Test 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Common-Factor Coefficients 


Ct 


( 1 ) 

Original 


.607 

.355 

.418 

.478 

.729 

.707 

.721 

.705 

.698 

.455 

.537 

.487 

.674 


-.060 

.038 

.148 

.083 

.257 

.354 

.367 

.197 

.409 

-.482 

-.390 

-.553 


-.443 

-.266 

-.429 

-.287 

.244 

.167 

.257 

.062 

.252 

.399 

.145 

.033 


.368 


-.135 


.558 

.203 

.362 

.314 

.646 

.641 

.750 

.571 

.758 

.554 

.449 

.531 

.599 


Total 

Contribution of factor (F p ) 


4.620 


1.392 


.954 


6.936 


Per cent of total original commu- 
nality 


20.1 


13.8 


Communality 


(2) 

Calculated 

(1) - (2) 

.568 

-.010 

.198 

.005 

.381 

-.019 

.318 

-.004 

.657 

-.011 

.653 

-.012 

.721 

.029 

.540 

.031 

.718 

.040 

.599 

-.045 

.461 

-.012 

.544 

-.013 

.608 

-.009 


6.966 -.030 


100.4 -0.4 
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Minres Solution 


9.1. Introduction 

As noted in chapter 6, a choice must be made of either communality estimates or 
the dimension of the common-factor space when using the classical factor analysis 
model (2.9). The methods treated in the preceding chapter involve the assumption 
of communalities, while the methods in this and the next chapter require the choice 
of the number of common factors. 

The word minres is a contraction of “minimum residuals,” and designates a 
long-sought method for factoring. Now that it is available, it might well replace the 
principal-factor and the maximum-likelihood methods for initial factorization of a 
correlation matrix. As noted in 2.3, one objective of factor analysis is to “best” 
reproduce the observed correlations. This objective can be traced to Thurstone’s 
statement. The object of a factor problem is to account for the tests, or their inter¬ 
correlations, in terms of a small number of derived variables, the smallest possible 
number that is consistent with acceptable residual errors” [477, p. 61]. In this chapter, 
the factor analysis problem as posed by Thurstone is solved by maximally (in the least- 
squares sense) reproducing the off-diagonal elements of the correlation matrix, and, 
as a by-product, obtaining communalities consistent with this criterion. The contrast 
between this objective and that of extracting maximum variance (treated in chap. 8) 
should be clearly understood. 

A brief history leading to the current method and the formal statement of the 
problem is given in 9.2. This is followed by the actual development of the method in 
9.3. Special procedures to restrict the derived communalities from exceeding unity 
are developed in 9.4. A discussion of statistical tests for the significance of the number 
of common factors follows in 9.5. Then a brief outline of the computing procedures 
is indicated in 9.6, and numerical illustrations of the minres solution are given in the 
final section. 


187 
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9.2. Formulation of the Minres Solution 

Conceptually, the idea of getting a factor solution by minimizing the residual 
correlations is an obviously direct approach. The first practical solution, however, 
was not developed until 1965 by Harman and Jones [207]. The idea certainly is not 
new —its accomplishment, however, was dependent on the high-speed computer. 
No doubt it must have crossed the minds of many workers in factor analysis over the 
years. The first theoretical treatment appeared in 1936, when Eckart and Young 
noted that “if the least-squares criterion of approximation [of one matrix by another 
of lower rank] be adopted, this problem has a general solution which is relatively 
simple in a theoretical sense, though the amount of numerical work involved in 
applications may be prohibitive” [114, p. 211]. This was followed in the next couple 
of years by additional theoretical work by Householder and Young [267] and 
Horst [250]. 

More recently, several papers have appeared that seem to bear some relationship 
to the problem. Whittle [519] specifically considers the residual sum of squares, but 
in relation to the principal-component solution. Howe seeks an alternative approach 
to Lawley’s maximum-likelihood equations (see chap. 10) and finds that his method 
—maximizing the determinant of partial correlations—“is approximately equivalent 
to minimizing the sum of squares of the partial correlations [268, p. 22]. Even more 
germane is the 1962 paper by Keller [304], which is a generalized mathematical treat 
ment skirting the precise problem to which the minres solution is addressed. 

It should be noted that none of the foregoing papers considers the minimization of 
off-diagonal residuals—the minimization of the total residual matrix (including 
diagonal terms) leads to the conventional principal-factor solution (see chap. 8). The 
exclusion of the diagonal elements, although appearing trivial, is of paramount 
importance. More specifically, as will be amplified below, the diagonal elements of 
the sample correlation matrix (the communalities) are not fixed but are parameters 
to be determined along with the factor loadings. 

Probably the first attempt to obtain a practical factor solution by minimizing off- 
diagonal residuals was suggested by Thurstone in 1954 and carried out by Rolf 
Bargmann and also by Sten Henrysson [see 479, p. 61]. More recently, Comrey [85] 
independently developed a computing procedure for such a solution. However, these 
investigators do not tackle the complete problem of determining a factor solution 
with the property that the sum of squares of residuals between observed and repro¬ 
duced correlations be a minimum. Instead they consider what might be termed a 
“stepwise” minimum residual method,* obtaining one factor and a residual matrix, 
which is then the starting point for the factor in the next step; this process is continued 
until a desired number of factors are extracted. In general, of course, such a solution 
is different from one obtained under the least-squares criterion for the entire set of 
factors. There has been one other attempt, by Boldt [44], which is more specifically 

* It may be of interest to note similar “stepwise” approximations to standard statistical proce¬ 
dures, namely, the determination of the coefficients in multiple regression successively rather than 
simultaneously [143], and an approximation to a maximum-likelihood factor solution [23]. 
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related to the method treated in this chapter. He poses the problem in essentially the 
same form and considers solutions by procedures similar to those tried by Harman 
and Jones [207, sec. 4]. 

The minres method assumes the classical factor analysis model (2.9), which is 
repeated in matrix form as follows: 

(2.35 bis ) z = Af + Du. 

Only the common-factor loadings in the matrix A = ( a jp ) are the parameters to be 
estimated. Once such a solution is obtained, the fundamental theorem of factor 
analysis gives (assuming uncorrelated factors, without loss of generality): 

(2.50 bis ) R t = AA? 

where R f is a matrix of reproduced correlations with communalities in the principal 
diagonal. What is required, then, is to get a “best” fit to the observed correlation 
matrix R by the reproduced correlations R t employing model (2.35). 

A least-squares fit can be obtained either by 


(9.1) 

fitting R by (R f + D 2 ), 

or by 


(9.2) 

fitting (R - I) by (Rt - H), 

where 


(9.3) 

H = I — D 2 = diag(AA') 


is the diagonal matrix of communalities determined from the solution A. In the case 
of (9.1), the minimization of residuals of the total matrix leads to the principal- 
component solution (see 8.2). In the case of (9.2), however, minimizing only the 
off-diagonal residuals leads to the minres solution. This condition may be expressed 
more precisely by: 

( 9 - 4 ) min || [R - I] - [AA' - diag(AA')]||, 

A 

in which it is emphasized that both A and H vary. The norm as expressed in (9.4) 
may be written out algebraically as follows: 

n n- 1 m 

( 9 - 5 ) /(A)= X £ - £ a Jp a kp ) 2 , 

= J -f 1 J = 1 p — X 

which is to be minimized. 

It should be noted that this function involves the n(n - l)/2 off-diagonal residual 
correlations which are dependent upon the elements in the factor matrix A. The 
objective of minres is to minimize the function/(A), for a specified m, by varying the 
values of the factor loadings. The diagonal matrix of communalities is obtained as a 
by-product of the method. 
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It is tacitly assumed, for the moment, that the communality produced for each 
variable is not greater than one. Actually, if only the condition (9.5) is imposed, an 
occasional minres solution would be obtained for which a communality would 
exceed unity. Of course, such a situation must be remedied if the factor analysis is to 
be acceptable. After developing the basic theory, the side conditions 

m 

(9.6) = E 4> ^ 1 (j = 1,2, • • • , n) 

p= i 

are introduced which restrict the communalities to numbers between zero and one 
for a minres solution. 

9.3. Minres Method 

As noted at the beginning of this chapter, the calculation of a minres solution is so 
complex that it is feasible'only with the aid of an electronic computer. Even then, it 
can get very costly in computer time unless an efficient algorithm is available. Several 
mathematical approaches were investigated, and tested empirically on many prob 
lems, before the recommended procedure was developed [207]. Among the methods 
explored and discarded was (1) the technique* of repeated calculations of a principal- 
factor matrix A and its associated communalities H leading to improvements in the 
objective function /; and (2) several variants of a class of mathematical techniques 
known as “gradient methods.” The latter methods seek an optimal value (maximum 
or minimum) of a function, iteratively, by proceeding from a trial solution to the 
next approximation in the direction of maximal change in the function. 

While the foregoing methods produced acceptable solutions, they were too time- 
consuming. Another mathematical method—the Gauss-Seidel process [518, sec. 130] 
—proved to be much more efficient. This technique is sometimes called a method 
of successive displacements” because it is an iterative process in which small changes 
are made in the variables and the corresponding new variables replace the original 
ones. It can be applied effectively to the computation of a minres solution. From the 
basic theorem of factor analysis (2.50), it is evident that if changes or displacements 
are introduced in only one row of A, the reproduced correlations will be linear func¬ 
tions of these displacements, and the objective function / will be quadratic only.f 
More explicitly, for any row j in A an increment e p (p = 1,2, • • •, m) is added to each 
element: 

a jl + € 1’ a j2 + € 2> ‘ ‘ ’ a jp + € p’ ’ ' ' ’ a jm 3" € m- 

The new factor loadings may be written in the form: 

(9.7) b jp = a jp + e p (p = 1,2, • • •, m), 

*This technique was employed in arriving at the solution of Table 8.14. The corresponding 
minres solution is given in Table 9.1. 

f This reduces the computing time considerably below that required in the gradient methods, 
where fourth-degree polynomials have to be solved [see 207], 
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where the V s are used for clarity; but ultimately, when the final set of factor loadings 
are obtained they are again designated by a’s for simplicity of notation. 

Then the reproduced correlations of the fixed variable j with any other variable k is 


m 


< 9 - 8 > * = £ “ kp b ip , 

p= 1 

and the sum of squares of residual correlations with this variable is given by 

n I m \ 2 


( 9 ‘ 9 ) fj ~ Z yjk— Z a kpbj P \ (;' fixed). 

* p ~ l 

Upon separating out the original factor loading from the incremental change, accord¬ 
ing to (9.7), the last expression becomes: 


(9.10) 


fi= Y. 


fc= 1 
k^j 




Z a kp € p 


0 fixed), 


where r% are the original residual correlations of variables k with the fixed variable; 
(without the incremental changes in its factor loadings), that is, 


(9 ' U ) r % = r ju ~ Z a kpaj P (k = 1 , 2 , • • •, n; k # j). 

p= i 

To determine the values of the es which minimize the objective function/, first 
take the partial derivatives of (9.10) with respect to each of these, say e q , as follows: 

8f, ” t « \ 

T7 = 2 £ KS - £ a kp ^\(-a kt ) (q = 1, 2, ■ • •, m). 

k=tj 

Then set these expressions equal to zero, and obtain the following implicit equations 
for the e’s: 


/ 


(9.12) 


£ 

P= i 


Z ®kpQ-kt 

fc= 1 
kqtj 


This may be put in matrix form, 


Z r >k q . 


k= 1 
k&j 


(9.13) 


€ j m 


r?A, 


(q = 1, 2, • • •, m). 


where ej = (e x , e 2 , • • •, e m ) is the row vector of incremental changes of the factor 
loadings for variable;, A )y( is the factor matrix with the elements in row j replaced by 
zeros, and r? is the row vector of residual correlations of variable ;' with all other 
variables (and 0 for the self-residual). Then the solution for the displacements to the 
factor loadings (for a given variable) that will minimize the objective function, is: 

< 9 - 14 ) tj = r? A(A' U( A U( )~ 
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The foregoing process is carried out systematically for all variables, in turn. Thus 
successive approximations of rows of factor loadings are obtained which yield a 
minimum value for the function/to any desired degree of accuracy. However, there 
is no guarantee that the resulting matrix of factor loadings will not lead to com- 
munalities greater than one. This problem is taken up in the next section. 


9.4. Additional Theory 

When a factor solution leads to the communality of some variable being greater 
than one, it is referred to as a “Heywood case” (see 7.3). To constrain the minres 
method to proper solutions, Harman and Fukuda [205] developed a mathematical 
programming procedure in which the final matrix A is obtained by minimizing (9.5) 
subject to the conditions (9.6). This is introduced as a modification to the basic 
computing procedure of the last section only in those instances when a communality 
exceeds one. 

Starting with the computing procedure of 9.3, the impact on the objective function 
(9.5) of replacing the a jp (for a fixed variable / by b jp , as defined in (9.7), is given by 
(9.9). This function, then, is to be minimized subject to 


(9.15) Z K S 1, 

p= 1 

i.e., the new values of the factor loadings must satisfy the constraints (9.6) as well. 
At this stage of the process, the r jk and the a kp are known and only the b jp may vary. 

If the minimum of/), as defined in (9.9), is obtained at a point (b/i, b j2 , • • •, b jm ) 
which belongs to the region defined by (9.15) there is no problem, and no modifica¬ 
tion is required. If the point does not belong to the region, then the problem gets 
complicated primarily because of the inequality in the side condition. This inequality 
may be removed by means of the following: 

Theorem 9.1. If the minimum offj is attained at a point outside of the region defined by 

(9.15) , then a minimum offj under the constraint (9.15) will be attained at a boundary 
point of the region, so that the constraint may be replaced by 

m 

(9.16) Z b% = 1. 

p= 1 

The proof [205, pp. 565-68] consists of transforming the quadratic form (9.9)— 
involving the diagonalization of a symmetric matrix and transformations of the 
variables b jp —to the simplified expression: 

m 

(9.17) Si = Z (*„ - Q 2 + K 

p= 1 

and subject to the constraint 

m y.2 

(9.18) Z ^ 5 1- 

p=l A p 
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In these derived expressions, the x p are functions of the original b jp and are the 
quantities to be determined; while the X p (eigenvalues of an m x m matrix), the £ 
and K are all constants determined from the known r jk and a kp and the intervening 
transformations. From the simplified form (9.17), it is evident that f s is the sum of a 
constant (X) and a square of the distance between a fixed point (£ 1 , £ 2 , • • •, £ m ) and a 
variable point (x l5 ,x 2 , • • • , xj belonging to the region defined by (9.18). Then, 
minimization of fj is equivalent to locating a point satisfying (9.18) which is at the 
minimum distance from the given point (£ l5 <* 2 ,..., £ m ). 

If the given point belongs to the region, i.e., 


(9.19) 

then this point itself is the 


fl.s. 

• • /*. 

minimizing point, and the solution 


is 


On the other hand, if the given point is outside the region, i.e., 


(P = 1,2, • • •, m). 


(9.21) 


A* + Aj 


+ ... + 


a 

a 


> i, 


then a point (x^, x 2 , • • •, x m ) belonging to the region must lie on its boundary in order 
to be at a minimum distance from the given point. Furthermore, since the x /A are 
obtained by an orthogonal transformation from the original variables, distance is 
preserved; therefore the point on the boundary of the region can be expressed in 
terms of the b jp as in (9.16). 

Having reduced the side condition to an equality, conventional mathematical 
methods are applicable to the problem of minimizing the function under the con¬ 
straint, when the minimum of fj is attained outside the region (9.15). The proof of 
Theorem 9.1 provides additional information which facilitates the solution of the 
problem. First, it shows that when the minimum of fj is attained in the region (9.15) 
its value is given by K in (9.17) and the minimizing point is (9.20). More important, 
for the case of the minimum of fj being attained outside this region, the foregoing 
development suggests a much more tractable approach than that originally posed 
by the problem of minimizing (9.9) under the constraint (9.16). The simplified prob¬ 
lem, which follows from (9.17) and (9.18), is to minimize 


( 9 - 22 ) (*1 ~ £l) 2 + (*2 - Zl) 2 + • • • + (X m - £J 2 

under the constraint 


x\ x\ 

If + If + 


The method of Lagrange’s multipliers (as employed in 8.3) is especially suitable to 
this problem. This involves the creation of a new function—the function (9.22) minus 
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pi (the Lagrange multiplier) times the function in (9.23)—-and setting its partial 
derivatives with respect to the m variables x p equal to zero. This leads to the equations 

* *i _ n 

*i £i ^ X 2 ^ 

* *2 n 

(9.24) * 2 - C 2 - = 0 


Y • A 'm A 


which, together with (9.23), constitute a set of (m + 1) equations in (m + 1) unknowns 
Xi, x 2 , • • •, x m , pi. 

The parameter pi can be determined from any one of the equations (9.24), namely, 


_ 4(x P - Zp) 


ip = 1,2, • • •, m), 


and may be eliminated by setting any one of the m determinations equal to any other. 
Thus, each of the subsequent determinations (9.25) may be expressed in terms of 
the first, i.e., 


4 1 - 


€p\ — 32 (i _ jLi 


and, solving explicitly for the remaining unknowns x p in terms of produces : 


_ AjZpX 1 

“ ul - a?)xi + 


{p = 2,3, • • •, m). 


Before proceeding to the general solution to the problem of minimizing (9.22) under 
the constraint (9.23), some special situations should be noted. If = 0 for any p, 
then x p = 0 must be a solution in order to minimize the distance, and the terms 
corresponding to this p may be deleted. Furthermore, it may be assumed that £ p > 0 
for every p. If an £ p were negative for any p, it could be replaced by \£ p \ and the result¬ 
ing solution x p replaced by -x p . Therefore it may be assumed that every x p is positive. 

Substitution of the values (9.26) into (9.23) gives rise to a polynomial equation in 
x t of degree 2m. The direct solution of such an equation can become quite cumber¬ 
some, so a numerical method of successive approximations is employed. The basis 
for it rests on the following: 

Theorem 9.2. For a given x 1 between 0 and min(^ u X t ), with x p (p = 2, 3, • • , m — 1) 
determined by (9.26) and x m by (9.23), if 


(9.27) 




xj 
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x x ^ xf, 

where xf designates the solution for x^ 

The proof begins with the fact that x p is an increasing function of x t (the s being 
assumed positive). Then the two conclusions are reached by the following reasoning: 
If < x?, then x p is less than its solution x*, and, consequently x m is larger than its 
solution x*. Therefore, 


^ 1 


< Il l 




1 


<Xi 1 


If > xf, then a similar argument leads to 



|i 

*1 



completing the proof. 

Before leaving the theoretical development of the minres method, certain of its 
features deserve emphasis. While the principal-factor solution, in general, is not a 
minres solution, the converse is always true—a minres solution results from applica¬ 
tion of the principal-factor method with appropriate diagonal entries. A principal- 
factor solution for a correlation matrix with minres communalities cannot be dif¬ 
ferent from the minres solution that produced those communalities; if it were, both 
the off-diagonal and the diagonal sums of squares of residuals would be increased— 
in the former case because the off-diagonal sum is minimized by minres and in the 
latter case because the diagonal sum is zero for minres. This means that the principal- 
factor (for the specified communalities) and minres solutions are equivalent. When 
put in canonical form (see 8.8) they are identical. In schematic form, 

( 9 - 29 ) (R — I + H min ) ——» A min 

the theorem states that a principal-factor analysis of a correlation matrix with 
minres communalities produces a minres factor solution. A corollary property is 
that a principal-factor solution of a correlation matrix with m-factor minres com¬ 
munalities will have a sum of the m largest eigenvalues equal to the sum of the com¬ 
munalities, while the remaining n — m eigenvalues will be positive and negative and 
add to zero. 

Just as a minres solution reproduces itself through PFA, so does a maximum- 
likelihood solution (see chap. 10), viz., 

(9-30) (R - 1 + H ML )-^>A ML . 

Of course, the principal-factor analysis of the correlation matrix with maximum- 
likelihood communalities is in canonical form, while the original maximum-likelihood 
solution (from which the communalities were taken) probably is not, and must first 
be put in that form in order to verify the equivalence. Now, the factor matrix obtained 
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by (9.30) is a least-squares fit to (R - I + H ML ) with perfect fit of the diagonal, and 
must therefore be a minres solution. The obvious implication of this is that minres 
—with its computational advantages—may be used in place of the highly desirable 
maximum-likelihood solution. 

9.5. Test of Significance for the Number of Factors 

In most of this book the subject of factor analysis is treated essentially in a math¬ 
ematical fashion accepting the observed data at their face value rather than samples 
from some universe values. While allusions are made to the “true” values, reproduced 
correlations are replaced by observed correlations, residuals are assumed to vanish, 
and the like, nonetheless there is no formal use of statistical estimation theory. It is 
not intended to disparage the other work, but merely to call attention to the dif¬ 
ference between the crude approximate procedures and formal statistical tests. 

It should be apparent that factor analysis deals with fallible data—the individual 
measurements and the correlations among the variables are subject to the vicissitudes 
of sampling. As a consequence, there is also sampling variation present in the results 
of a factor analysis. In particular, the judgment concerning the statistical significance 
of the number (m) of common factors should be based on their contribution to the 
reproduced correlations as related to the actual sampling variations of these correla¬ 
tions. 

The problem of placing factor analysis on a sound statistical foundation has 
plagued its proponents from the very inception of the theory. In the early days, when 
the subject was relatively simple, considerable attention was paid to the statistical 
theory underlying its practical applications. Spearman set forth the conditions for 
his “Two Factor” theory in appropriate statistical terms, and some work was done 
on sampling errors of “tetrad difference” [e.g., 444]. In the rapid advance of factor 
analysis in the 1930’s, the emphasis was placed on the extension of the method to 
encompass matrices of correlations which obviously did not form a hierarchy in 
Spearman’s sense. The bulk of the work in this period was devoted to developing 
computing methods for analysis of a complex battery of psychological tests into 
multiple factors. In going from a single general factor to many common factors, the 
statistical questions were complicated manifold, but tended to be overlooked. Of 
course there were notable exceptions, such as the work of Hotelling [259]. 

In the early 1940’s the first concerted efforts were made by Lawley [320, 321] to 
provide a statistical basis for the new methods of factor analysis. He suggested the 
use of the “method of maximum likelihood,” due to Fisher [128,129], as the basis 
for estimating the universe values of the factor loadings from the given empirical 
data; and for such “efficient” methods of estimation, he provided a statistical test of 
significance concerning the number of factors required to explain the observed 
correlation coefficients. These methods are presented in the next chapter. 

Prior to the breakthrough in 1940 by Lawley [320] several less fruitful attempts 
were made to establish statistical tests for the factor analysis model (2.9). Coombs [86] 
considered the residual matrix, after any number of factors had been extracted, as 
containing both common and error variance; and he introduced the notion of 
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“critical value” for that point at which an additional factor would contain error 
variance overshadowing the common-factor variance. This attempt to determine 
significant factors was designed specifically for the centroid method, and was 
dependent upon sign changes of variables and the number of negative entries in the 
residual matrices. Another attempt, tied to the bi-factor method of analysis, was 
made by Holzinger and Harman [243, pp. 122-32]. After some laborious manipula¬ 
tions, in the spirit of the earlier work on standard errors of tetrad differences, approx¬ 
imate formulas were derived for the standard error of a residual and for a factor 
coefficient. These results are shown in Tables A and B of the Appendix. Still another 
attempt to derive a significance test for the number of common factors was made by 
Hoel [231]. While he initially developed a test for a principal-component solution, 
he subsequently modified it for the centroid solution in terms of the factor model (2.9). 

Hotelling [259] was the first to provide a rigorous statistical test for the number of 
significant factors in a principal-component solution. More recently, Bartlett [28] 
has made further valuable contributions to significance tests. For an analysis into 
principal components, he presents % 2 approximations for testing the statistical 
significance of the unreduced correlation matrix and of the residual roots, i.e., after 
several of the largest roots have been determined. Bartlett also recommends an 
adjustment in the x 2 statistic for the conventional factor analysis model. 

Probably the most important theoretical work that can be applied to the minres 
solution was performed by Rippe [401]. Specifically, he developed a large sampling 
criterion forjudging the completeness of factorization. The test is independent of the 
particular type of factor solution (in contrast to the test of 10.4 which implies 
maximum-likelihood estimates of the factor loadings). Its basic assumption is that 
the original variables have a multivariate normal distribution, from which it follows 
that the correlations have a Wishart distribution (see 10.3) and the sample values 
are maximum-likelihood estimates of the population correlations. While Rippe’s 
development [401, pp. 193-96] is explicitly in terms of the sample covariance matrix, 
the results are equally applicable to a factor analysis of the sample correlation matrix. 

The statistic for testing the significance of m factors, in the notation of the present 
text, may be put in the form: 


(9.31) 


U m = (N- l)log e 


| A A' + D 2 | 

|R| 


which is asymptotically distributed as % 2 with degrees of freedom equal to 
(9.32) v = j[(n — m) 2 + n — m]. 


The test procedure is to reject the hypothesis of m common factors if U m exceeds the 
value of x 2 for the desired significance level; otherwise it would be accepted. Of course, 
if the hypothesis is rejected an alternate hypothesis of some larger number of factors 
may be assumed to explain the observed correlations. 

It should be noted that the derivation of (9.31) was based upon the sampling 
variation of the observed correlation matrix R (and consequently on the sampling 
variation of the factor matrix A). The variability of the individual correlations is, of 


4i. 
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course, dependent on the sample size N. Since a correlation matrix would be subject 
to extreme variation for small samples, it is standard practice to apply factor analysis 
only to large samples. Hence, the large-sample approximations made in the course 
of arriving at (9.31) are not really additional constraints on good experimental 
practice. 

The distinction between statistical significance and “practical significance” should 
be borne in mind. Statistical significance should convey the thought of a technical 
test with an associated probability level. Inferences about certain numerical values 
obtained from an empirical study cannot be made in an absolute sense, but must be 
made in terms of some kind of degree of belief, i.e., in a probabilistic sense. Statistical 
tests of hypotheses are then described in terms of some arbitrary levels of significance, 
which are usually expressed as percentages with popular values being 5 and 1 per cent. 
Then, if the difference between the theoretical value of a statistic and its value derived 
from the observed data were significant at the 1 per cent level, one would conclude 
that the difference was “real,” rejecting the null hypothesis of no difference. However, 
there may be practical considerations which vitiate such a conclusion. There may be 
real statistical additional information, but it may have no practical importance. Thus, 
in testing an hypothesis for the number of common factors required to explain the 
relationships in an observed correlation matrix (based upon a very large sample), the 
last one or two factors may prove to be highly significant in a statistical sense and 
still have no practical significance. 

On the basis of actual experience, factor analysts have developed crude guides for 
“when to stop factoring,” as indicated in 2.6. In addition to such crude judgments 
about the residual matrix (which, incidentally, a number of workers have shown to 
be remarkably close to the more exact statistical tests), another practical approach 
has been found to be useful. The proportions of the total variance (or total com- 
munality) accounted for by each factor is considered. If, after 75 per cent (or 80 per 
cent or 90 per cent) of the total variance is accounted for, any additional factor 
accounts for less than 5 per cent (or 2 per cent) it would not be retained. Such arbitrary 
consideration is quite apart from the statistical significance of such an additional 
factor—it is dropped because the decision was made beforehand that any factor 
having such small impact on the total variance could hardly have any practical 
significance. 

Through very extensive applications of electronic computers, Kaiser [296] has 
arrived at a practical basis for finding the number of common factors that are neces¬ 
sary, reliable, and meaningful for the explanation of the correlations among the 
variables. His recommendation—after considering statistical significance, algebra¬ 
ically necessary conditions, psychometric reliability, and psychological meaningful¬ 
ness—is that the number of common factors should be equal to the number of 
eigenvalues greater than one of the correlation matrix (with unities in the diagonal). 
He has found this number to run from a sixth to about a third of the total number 
of variables (in the example of Table 8.19 this number is 5 out of 24 variables). 

It has been found by a number of workers that empirical tests of significance used 
by factor analysts frequently lead to about the same results as the more proper 


198 



miiMjsLS SOLUTION 9.6 

u u , p. 150]. Perhaps the statistician’s complaint about arhitrarinpcc 

has become a ‘smoke-screen,’ but practicing statisticians^wih tXh ^ tells rf 

rsfncdteTts “ f re . m ” any clrcumsta nces superfluous to an experienced worker 

Let? may or raay not indicate raeaningfui factors • • •^ 


9.6. Computing Procedures 

The foregoing theoretical development can be adapted to ready calculation on an 

? b A breViato ? flo r C n hart f ° r P™S ra ”">»g thT^eZZZ 

presented m Figure 9.1. A more detatled description of each step follows. 

1. The comp ete correlation matrix R (with ones in the diagonal) is input 
^ r n WS ' tr S ° St ° red are SUCh P arameters as ‘he number of variables n ■ the 

r^d 6 ^ faCt ° rS ’ m (or a range m ‘ to • the maximum number of iterations 
1, and the convergence enterion, « (not to be confused with the vector c of 

" conveniem to have the input data p ™ ted (p-My 

2. The subroutine for the calculation of the eigenvalues and eigenvectors of a 
eal symmetric matrix is used many times in the course of getting a minres 

solution. An example of such a subroutine is the flowchart of Figufe 8 3 (also 
see references m 8.6). 8U1C8J iaiso 

3 ‘ * rwVh h lb Ub r. U t. tine haS and StOTed the arb 'trary factor 

4 with the 

using step 

the initiai factor matrk a » - d *« 

5. A subroutine for the solution of a system of linear equations. 

t ^'7 3 DeW faCt ° r matrix A , has been determined in which the load- 
ngs in the first row have been replaced by the computed values (9 7) This 
constitutes iteration 1. For each iteration i, a new factor mat7 A is de7 

triable) " Pt 0 " the A repreSents the iteration number, not'the pivot 

7. The communality hf (of the new row of A) is tested to see if it is greater than 
one. Ifhf > 1 proceed to step 8, otherwise to step 15 

^»°^^^r i “ 2 “ pre rfonn b'Wb in 

- , ’ jpyJ nxe d, p — 1,2, • • •, m), plus a linear expression in these 

“e 3 C ° nStant [2 ° 5 ’ SeC ' 21 The sy ~c "> a « x is determined" 
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9. Diagonalization of the matrix W is accomplished by means of the subroutine 
of step 2. According to (8.21), any symmetric matrix may be diagonalized by 
means of an orthogonal transformation Q, given by: 

Q'WQ = A, 

where A is a diagonal matrix with elements X\, Af, • • •, Aj which are the 
eigenvalues of W, and the columns of Q are the eigenvectors of W. 

10. Additional transformations which lead to the derived unknowns x p as func¬ 
tions of the original b Jp , and to the derived constants £ p and K as functions 
of the known r jk and a kp . The result of this step leads to equation (9.17) for 
the objective function, subject to the condition (9.18). 

11. The process pivots on the first unknown x t and employs an iterative scheme 
implied by Theorem 9.2 for the determination of the remaining x’s. The 
following initial value for seems convenient: 



12. Compute the remaining x p (p = 2, 3, • • •, m - 1) by use of (9.26) and x m by 
use of (9.23). 

13. The convergence of x, to xf, according to Theorem 9.2, is tested. The loop, 
steps 11-13, constitutes the distance optimization routine, and the iteration 
process is continued until x t converges to its solution within 10“ 8 . 

14. The modified matrix A, is determined. This completes the modification loop, 
steps 8-14, to correct for hj > i. 

15. Compute residual matrix (with zeros in diagonal), according to (9.2), from 
the factor matrix A ; and the original correlation matrix R. The designation 
R m merely calls attention to the fact that the residual matrix is based upon 
m common factors, but the iteration number is omitted for the sake of 
simplicity. 

16. Compute and store the matrix of changes in the factor loadings, from the 
preceding to the current iteration, for later use. 

17. Test to see if each row of the factor matrix has been subjected to the Gauss- 
Siedel process. The loop from step 4 to step 17, for y = 1,2, • • •, n, constitutes 
a major iteration cycle. Thus, in the first major iteration cycle, the successive 
factor matrices A 1? A 2 , • • •, A„ are determined from the vectors e,, e 2 , • • •, c B 
representing the incremental changes in the loadings for the n variables. 
Similarly, a set of n factor matrices are determined in each of the major 
iteration cycles. 

18. At the conclusion of each major iteration cycle (i.e., after determining A c „, 
where c — 1,2, 3, • • • is the number of the major iteration cycle), the follow¬ 
ing convergence criterion is applied : 

mdX | (i)®jp (i— l)&jp\ ^ 0 * * ' ■> W i P ’ ’ ' ■> fw)> 

hP 
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where i is the iteration number. This test says that the maximum change in 
the factor loadings, as stored in step 16, must be less than <r. For most problems 
e = .001 is satisfactory. 

19. If the factor loadings have not converged, a test is made to see whether the 
preset maximum number of iterations has been reached. As a precautionary 
measure, t = 1,000 is recommended (so the process will terminate with 
i = cn, which is the smallest multiple of n exceeding 1,000). 

20. When the factor loadings have converged, or the maximum number of itera¬ 
tions reached, the last determined factor matrix is rotated to canonical form 
(see 8.8). The subroutine of step 2 is again employed in this process. 

21. The output of the program includes the minres factor matrix A; the final value 
of the objective function /; the matrix of final residuals; the frequency 
distributions of residuals and changes in factor loadings, along with their 
means and standard deviations; and the statistic U m and the number of 
degrees of freedom v. 

Although not shown in the flowchart, the computer program does determine U m 
according to (9.31) and the associated number of degrees of freedom. The table 
lookup of x 2 for comparison must be done manually. 

9.7. Illustrative Examples 

A computer program following the procedures of 9.6 has been written [203] in 
FORTRAN and used on a Philco 2000 to get minres solutions for dozens of problems 
in the course of the experimental work, and subsequently adapted to such computers 
as IBM 7044 and CDC 3200. It has been very satisfactory, not only in meeting the 
objective of minimizing the off-diagonal residuals but also in accomplishing this 
very efficiently. The time requirements for the calculation of minres solutions are 
indicated in Table 9.1. For a given problem, the time necessary to get a minres solu¬ 
tion must be more than that required for a principal-factor analysis since the latter 
(at least the eigenvalue-eigenvector routine) is employed at several stages in the 
course of getting a minres solution. Nonetheless, the actual time requirements for the 
minres method are exceedingly short for small to modest-sized problems. With the 
new generation of fast computers the method should be practical even for very large 
problems. 

Before considering the individual problems, a general note is in order. It should be 
apparent that the “best” fit of a model to empirical data, in the sense of the objective 
function / being a minimum, may not appear very convincing. If the model specifies 
two factors for a set of 50 variables, it is to be expected that the residuals may be of 
sizable magnitude, although the sum of squares of off-diagonal residuals has been 
minimized. All that can be said is that for the given hypothesis, the resulting minres 
solution best satisfies the least-squares criterion. As regards the actual significance of 
factors in the statistical sense, the procedure of 9.5 may be applied for large samples. 

In any event, for a given hypothesis regarding the number of factors, it is desirable 
to obtain a stable factor solution. The objective, then, is to set a convergence criterion 
to guarantee the accuracy of the factor loadings. To be sure that the minres solution 
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Table 9.1 

Computing Time for Minres Solutions 


Problem Size 

Time Estimates 3 



Philco 2000 

Computers of 

n 

m 

(Model 210) 

late 1960’s 

5 

2 

3 sec 



8 

2 

5 sec 



8 

3 

18 sec 


’ 1-5 sec 

9 

2 

5 sec 



9 

3 

12 sec 



24 

4 

39 sec 


24 

5 

3.0 min 

z 5-30 sec 

24 

6 

5.6 min 

J 

36 

12 

8.0 min 

40 sec 

42 

7 

6.1 min 

) 

42 

8 

6.2 min 

Z 30-60 sec 

42 

9 

6.0 min 

J 

42 

10 

12.5 min 


f 

42 

11 

15.4 min 



42 

12 

26.3 min b 


> 1-4 min 

42 

13 

30.0 min b 


42 

14 

27.1 min 



42 

15 

31.8 min b 

J 



* Based upon actual experience on the Philco computer and 
estimates of time on such large-scale machines as IBM System/360 
CDC 6600, and GE 625. 

b Did not converge to the standard of step 18 of the program 
flowchart, so that the time is for 1,008 iterations. 


has stabilized, the maximum change from one iteration to the next of all factor 
loadings is required to be bounded by some pre-assigned small number, as specified 
in step 18 of the program flowchart. 

1. Five socio-economic variables.— The first illustration is again for the simple 
numerical example introduced in chapter 2. From the previous experience with this 
problem, it seems reasonable to assume two common factors. Since the starting point 
for the minres calculations is an arbitrary factor matrix, the first two principal 
components are selected for that purpose (this is done automatically by the computer 
program, once the number of factors m is specified). The computer output gives the 
solution of Table 9.2, along with the final value of the objective function (/ == .00098), 
the matrix of residuals and their frequency distribution, and the frequency distribu¬ 
tion of changes in the factor loadings for the last iteration from the preceding one. 
While the latter data may be of interest in judging the adequacy of fit, and perhaps 
for the interpretation of results in a practical application, they are omitted here to 
conserve space. Also, tests for the statistical significance of the number of common 
factors are relegated to the problems and exercises for this chapter. It should be 
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Table 9.2 

Minres Solution for Five Socio-Economic Variables 
(Two common factors) 


Variable 

j 

Fi 

f 2 

h] 

1. Total population 

.621 

-.783 

1.000 

2. Median school years 

.701 

.522 

.764 

3. Total employment 

.702 

-.683 

.958 

4. Misc. profess, services 

.881 

.144 

.797 

5. Median value house 

.781 

.605 

.976 

Variance 

2.756 

1.739 

4.495 


evident from Table 9.2 that this problem involved a Heywood case. The first variable 
required the use of the modification loop (steps 8-14 of the program flowchart) in 
order to correct for h\ > 1. 

It is of interest to make certain comparisons of the minres and principal-factor 
results. Of course, the first two principal components account for more variance 
(4.670) than any other two factors, but the sum of squares of off-diagonal residuals 
produced from this initial matrix is not a minimum (/ = .01217). For the minres 
solution of Table 9.2 the objective function was improved considerably. Also, while 
not designed to maximize variance, the two factors which provide “best fit to the 
off-diagonal correlations do account for 90 per cent of the total variance in this 
simple problem. 

2. Eight physical variables.—Next, the example of eight physical variables of 
5.4 is used, with hypotheses of two and three common factors. The resulting minres 
solutions are presented in Table 9.3. From a quick glance at the two factor patterns, 
one would wonder whether there is sufficient justification for the more elaborate 
model—the fit of the two-factor solution is indicated by / = .01205, and of the three- 
factor solution by / = .00452. Also, the latter accounts for 77.1 per cent of the total 
variance, or slightly more than the 74.5 per cent accounted for by the former. 

A closer look at the solution for m = 2 factors provides some additional informa¬ 
tion that may be useful for other applications of the minres method. If the model 
were to fit the data precisely, all residuals would vanish—but, of course, this is not 
to be expected of empirical data. The actual residuals range from -.027 to .044 with 
a mean of zero to more than four decimal places and a standard deviation of .021. 
From a practical point of view, the magnitude of the residuals may be considered too 
small to provide another meaningful factor. However, for adequate statistical explana¬ 
tion of the observed data on 305 cases a third factor may be required. The large sample 
test of 9.5 can provide an answer to such a question. The statistic (9.31) for this 
example becomes: 

U 2 = 304 log e (.00125958)/(.00096740) = 79.8. 
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Table 9.3 

Minres Solutions for Eight Physical Variables 
(Two and three common factors) 


Variable 

j 

Hypothesis: m 

= 2 

Hypothesis: m — 3 


Fi 

f 2 

#1 

Fi 

f 2 

f 3 


1. Height 

.856 

-.324 

.838 

.860 

-.322 

-.160 

.868 

2. Arm span 

.848 

-.412 

.889 

.867 

-.432 

.242 

.998 

3. Length of forearm 

.808 

-.409 

.821 

.803 

-.396 

.031 

.803 

4. Length of lower leg 

.831 

-.342 

.808 

.835 

-.340 

— .163 

.839 

5. Weight 

.750 

.571 

.889 

.751 

.583 

-.113 

.915 

6. Bitrochanteric diameter 

.631 

.492 

.640 

.626 

.492 

.019 

.635 

7. Chest girth 

.569 

.510 

.583 

.565 

.508 

.001 

.577 

8. Chest width 

.607 

.351 

.492 

.611 

.362 

.182 

.537 

Variance 

4.449 

1.510 

5.959 

4.480 

1.533 

.158 

6.171 


Here, again, a computer is necessary to obtain the determinants of the matrices of 
observed and reproduced correlations. The number of degrees of freedom is 21 
according to (9.32). It will be found from Table D in the Appendix that for 21 degrees 
of freedom x 2 = 46.8 for P = .001. This says that the probability of getting a value 
of x 2 in excess of 46.8 is only 1 in 1,000, and since the actual value U 2 — 79.8 is con¬ 
siderably greater, the hypothesis of m = 2 is rejected and it must be assumed that at 
least three common factors are required for adequate explanation of the observed 
data. 

Next, the statistical test for m = 3 can be made. Of course, the determinant of the 
matrix of observed correlations remains unchanged. The determinant of the matrix 
of reproduced correlations from the solution with three factors is .00105459, slightly 
smaller than that for the solution with two factors. The statistic (9.31) then has the 
value U 3 = 26.2, with 15 degrees of freedom. The corresponding x 2, s are 37.7 for 
P = .001, 30.6 for P = .01, and 25.0 for P = .05. The hypothesis of three common 
factors would be rejected at the 5 per cent level but would be accepted at the 1 per 
cent level. 

The solution with three common factors was used in an empirical investigation of 
the relationship between minres and maximum-likelihood solutions. As noted at the 
end of section 9 . 4 , a principal-factor analysis of a correlation matrix with maximum- 
likelihood communalities produces a maximum-likelihood solution which is equiv¬ 
alent to the original one from which the communalities were taken—yet this solution 
is a minres solution. According to the conditions for minres and for maximum- 
likelihood (see 10.3), the two solutions should be identical only if the communalities 
are equal. How much the actual communalities for a set of n variables may differ and 
still produce practically equivalent results is a matter for empirical investigation (so 
long as it is understood that it is not mathematical equality that is sought). For the 
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current example the following solutions were obtained : 

Maximum-likelihood (arbitrary initial matrix), 

Principal-factor (ML communalities), 

Minres (ML initial matrix), and 
Maximum-likelihood (minres initial matrix). 

After the maximum-likelihood solutions were put in canonical form, all four solutions 
were practically identical. Communalities and contributions of factors differed only 
in the third decimal place; and individual factor loadings, with only a few exceptions, 
agreed to within a few units in the third place. The equivalence of these solutions was 
found in spite of the fact that the 8 communalities range from .5 to 1.0 as can be 
seen in Table 9.3. 

It should be noted, however, that the minres solution found to be equivalent to a 
maximum-likelihood solution was for a particular local maximum (solution 2 of 
Table 10.6), while another maximum-likelihood solution (number 1 of Table 10.6) 
differed from it. This problem—local versus global maximum or minimum—is 
unresolved for either the maximum-likelihood or the minres method. Even when a 
computing procedure converges there is no assurance that the optimal point (max¬ 
imum or minimum) is for the entire surface in the multidimensional space or only for 
a local area. In practical usage, this question may be immaterial as long as a reasonably 
good solution is obtained. 

3. Eight emotional variables.—Another eight-variable example is used for 
illustration—the eight emotional traits of 8.7—primarily because of the difficulties 
encountered in trying to get a principal-factor solution based upon SMC’s as com¬ 
munalities. Of course, no such difficulties arose in the minres method because no 
prior estimates of communalities are required. Instead, the hypothesis of two common 
factors was assumed (based on the prior experience with this problem), and the 
resulting minres solution is shown in Table 9.4. 

Table 9.4 

Minres Solution for Eight Emotional Traits 
(Two common factors) 


Variable 

j 


f 2 

A? 

1 

.982 

.065 

.968 

2 

.935 

-.111 

.888 

3 

.833 

-.550 

.997 

4 

.720 

-.091 

.526 

5 

.676 

.330 

.566 

6 

.526 

.164 

.304 

7 

.515 

.583 

.605 

8 

.355 

-.128 

.142 

Variance 

4.176 

.820 

4.996 
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xTe* “ diffe ery Sim ‘' ar *° ^ Pri " dpa >- 

solutron ,s larger than the estimated communalitv 14 X 71 I n the mmres 

^eed^ngalar wha r w happen if minr e S solutions ^ 

tions up’to m - 8 Tta" 8 ^7 dgh ! variables - then solu- 

(mathematically) by the time m - 6 Anv « i™ f P l 8X6 com P letel y explained 


Table 9.5 

Characteristics of Minres Solutions for Eight Emotional 
(two to six common factors) 


Traits 


Number of 
Factors 


Objective 

Function 

/ 


Number of 


r 

Communalities 

Number of 


equal to 1 

Iterations 


(to 2 dec.) 

(max = 1000) 



2 

3 

4 

5 

6 


.07431 

.02605 

.01380 

.00277 

.00018 


1 

2 

4 

4 

7 


56 

88 

1,000 

1,000 

1,000 


Time 

(on Philco 
2000) 


5.0 sec 
18.7 sec 
8.4 min 
12.1 min 
25.5 min 


DrohJm e T ty 'H 0 " r psych0l °S fcal tes ts. The final illustrations involve a fair-sized 
mdicated in Table 9.1, minres solutions were obtained for all thre’e stations'Som 

■^^■aataaisiittsa 
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Table 9.6 

Minres Solution for Twenty-four Psychological Tests 



that is analyzed. 5 f T ui ni Rv looking at the variance accounted 

than the corresponding principal factor—with only minor differences for e 
our actors and a very significant difference for the fifth. Upon looking morti closely 
Us found that variable 19 is primarily responsible for .he a**tonal£ 

A s a matter of facf ^j^^g^^^l^fl.Q^t^^soimionl^while^Ue rninres solution 

DroducL°the limiting value (1.00) for this communality. This was not the case for the 
SSL with four factors (*?, = .235) but it remained at the hmitmg value 

W ’principal-factor method accounts for the maximum variance for a given 
number oUactorsf it cannot account for more variant, thar, ongtna^pu 
correlation matrix (i.e., the estimates of the communalittes). In the prmcipa 
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solution of Table 8.21, the squared multiple correlations were analyzed. These are 
known to be lower-bound estimates of the communalities. Therefore, it is not sur¬ 
prising that the minres solution accounts for almost 4 per cent more than the total 
communality calculated in the principal-factor solution. The by-product of the 
minres method—the communalities determined for the specified number of factors— 
certainly provides the best known approximation to the concept of communality. 

In the discussion of this problem in 8.7 it was argued that five factors appeared to 
be the optimal number and that even four might suffice from a practical vantage point. 
That Served as a guide in this chapter as well. Nonetheless, it is advisable to study the 
effect of the different number of factors. Without carrying this notion to an extreme, 
a comparison is made of certain statistics for the choices of four, five, and six factors' 
Table 9.7 presents such statistics. It should be obvious that as the complexity of the 


Table 9.7 

Frequency Distributions of Minres Residuals for Twenty-four 

Psychological Tests 

(Four, five, and six common factors) 


Class Interval 

Frequency 

m = 4 

m — 5 

m — 6 

More than .0475 

29 

25 

16 

.0425- .0475 

9 

6 

7 

.0375- .0425 

6 

6 

5 

.0325- .0375 

7 

8 

10 

•0275- .0325 

16 

11 

10 

.0225- .0275 

15 

19 

11 

.0175- .0225 

19 

10 

14 

.0125- .0175 

13 

19 

18 

.0075- .0125 

12 

12 

18 

.0025- .0075 

15 

24 

24 

-.0025- .0025 

9 

11 

25 

-.0075—.0025 

14 

18 

18 

-.0125—.0075 

12 

17 

15 

-.0175—.0125 

13 

8 

14 

-.0225—.0175 

9 

11 

14 

-.0275—.0225 

12 

14 

7 

-.0325—.0275 

6 

7 

16 

-.0375—.0325 

10 

8 

8 

-.0425—.0375 

10 

10 

2 

-.0475—.0425 

5 

4 

4 

less than —.0475 

35 

28 

20 

Total 

276 

276 

276 

Mean 

-.0000467 

-.0000225 

-.0000169 

S.D. 

.04089 

.03708 

.03186 

/(A) 

.45989 

.37811 

.27909 
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model is increased, the fit should be improved. This is evident from the actual fre¬ 
quency distributions and from the summary statistics (means, standard deviations, 
and the objective function). The extent to which the model fits the off-diagonal 
correlations may be judged not only by the decrease in the objective function as the 
model increases from four to five to six factors, but also by comparing these values 
with the corresponding ones for principal-component solutions. Thus, the improve¬ 
ment in the sum of squares of residuals for each hypothesis is given by: 

m = 4: /pc -/ mi „ =-46788 

m = 5: / pc - / mi „ = .52775 

m = 6: /p C - /min = -55765. 

Thus, while the objective function is smaller for a larger number of factors, whether 
computed for a principal-component or a minres solution, the difference between the 
two solutions becomes more pronounced as the number of factors in a model goes up. 

Finally, the more precise statistical criterion of 9.5 is applied to test the hypothesis 
of five common factors. The values of the determinants, obtained on a computer, are 
as follows: 

| A A' + D 2 | = .0000507017 
|R| = .0000107920, 

and for N = 145 cases, the statistic (9.31) becomes U s = 222.8. The number of degrees 
of freedom, according to (9.32) is v = 190. While Table D does not extend to such a 
large value of v the probability can be approximated from the normal distribution. 
This is accomplished by setting 

z = ^/Zy 2 - Jlv - 1, 

treating it as a normal deviate with unit variance, and reading the area under the 
normal curve from Table C in the Appendix. Then the probability for y 2 corresponds 
to the area of a single tail of the normal curve beyond the particular deviate. If a y 2 
equal to U 5 is considered, the normal deviate becomes z = 1.64 and 

P = .50 - .4495 = .05, 

where the area from the mean to the deviate is taken from Table C. Hence the 
hypothesis of five common factors is accepted at the 5 per cent level of significance. 
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10 . 1 . Introduction 

It was mentioned in 9.5 that Lawley [320, 321] in the early 1940’s made a funda¬ 
mental contribution to factor analysis by providing a statistical basis for judging the 
adequacy of the model (2.9), with a specified number of factors, to explain an empirical 
correlation matrix. His statistical test for the number of common factors is dependent 
upon a particular type of factor solution, namely, maximum-likelihood estimates of 
the factor loadings. The amount of computation arising from this method restricted 
its use to small problems in the 1940’s and 1950’s. While present-day computers make 
the maximum-likelihood method feasible, the perplexing question of the actual 
convergence of the process still remains. 

Because this chapter is concerned with formal statistical theory, a brief summary 
of some of the basic ideas in statistical estimation is presented in 10 . 2 . With this 
foundation, the exposition of the maximum-likelihood methods in factor analysis 
becomes more meaningful and clear. Such a development is given in 10 . 3 , including 
the essential mathematics for estimating the factor coefficients. This is followed in 
10.4 by an asymptotic % 2 test of significance for the number of common factors 
Detailed computing procedures for estimating factor loadings by the maximum- 
likelihood method and for testing an hypothesis regarding the number of factors are 
outlined in 10.5 and illustrated with the example of eight physical variables. Further 
discussion of numerical illustrations is given in 10.6. 

10 . 2 . Statistical Estimation 

Most of the development in this text has been essentially in terms of mathematical 
solutions. While factor analysis methods were applied to sample correlation matrices, 
the interpretation of results tacitly has been in terms of the population from which 
the sample presumably was drawn. With the exception of the discussion in 9.5, little 
attention has been paid to the statistical problems of the uncertainty of conclusions 
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that might be derived from the empirical data. In this chapter, on the contrary, a 
conscious distinction is made between the intercorrelations of the observed variables 
and the hypothetical values in the universe from which they were sampled. Then, 
based on the observed data, estimates are obtained of the universe factor weights, 
under the assumption of the factor model (2.9), and the statistical significance of such 
hypotheses is determined. 

To help clarify the ensuing methods of analysis, some of the fundamental concepts 
in statistical estimation are reviewed in this section. First of all, the process of estima¬ 
tion is concerned with making inferences about the values of unknown population 
parameters from the incomplete data of a sample. Suppose, for example, that the 
distribution function for a variable x is dependent upon two parameters, 9 { and 0 2 - 
This is usually represented by f(x;9 l ,9 2 ), and may be conceived as a distribution 
function of a single observation x. In contrast, the joint distribution of N (independent) 
observations x l5 x 2 , • • •, x N is given by the product of the individual functions, viz., 

II /(*.-; 0i> 9 2 )- 

i = 1 

When the foregoing is considered as a function of the 0’s for fixed x s it is often 
called the “likelihood function” of the sample, and is denoted by L. 

The typical estimation problem involves the determination of numerical values 
for 9 t and 0 2 from a random sample of N observations. To estimate the parameters 
is then tantamount to finding functions of the observations—usually denoted by 
0!(x l5 x 2 , • • •, x N ) and 0 2 (x l5 x 2 , • • •, x N )— such that the distribution of these functions 
in repeated samples will concentrate near the true values. Such functions of the 

sample values are called estimators. ... 

Since there are many choices for an estimating function, the following criteria are 

frequently used in selecting among them: 

1. An estimator @ is said to be consistent if it converges (in a probabilistic sense) 
to the true parameter as the sample increases without limit, i.e., 

lim 9 -» 9 


2. An estimator is said to be efficient if it has the smallest limiting variance. 
When an estimator is efficient it is also consistent. 

3. An estimator is said to be sufficient if it utilizes all the information in the 
sample concerning the parameter. 

4. If the expected value of the estimator is the true parameter, i.e. 

E0) = 9, 

then the estimator is unbiased. While it is of some advantage to devise an 
unbiased estimate, it is not a very critical requirement. 

The method of maximum likelihood is a well-established and popular statistical 
procedure for estimating the unknown population parameters because such esti¬ 
mators satisfy the first three of the above standards. Not all parameters have sufficient 
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estimators but if one exists the maximum-likelihood estimator is such a sufficient 
estimator [369, p. 185]. However, a maximum-likelihood estimator will generally 
not be unbiased.* This method yields values of the estimators which maximize the 
likelihood function of a sample. 

An application of the method of maximum likelihood to a well-known situation 
may be useful by way of introduction to the more complex problem treated in 10.3. 
Consider the normal distribution for the single variable X : 

(10.1) f{X;n,a) = — 

which is dependent upon two parameters, the universe values of the mean and 
standard deviation. The likelihood function is defined by 

< 10 - 2 ) L=Ylf{ Xi -,^al 

i = 1 

and for the N independent observations takes the form: 

<10 ' 3) L = (2je)'*« < 7 ' veXp ~ (2?) {X ‘ ~ ' l)2 ' 

To get the maximum-likelihood estimators of /r and a it is necessary to maximize 
the likelihood function L. However, since likelihood functions are products, and 
many are expressed in terms of exponentials as in the example, it is customary to 
maximize the logarithm (to the base e) of the likelihood instead. This is done merely 
to simplify the mathematics because the maximum of the logarithm occurs at the 
same point as that of the likelihood itself. Proceeding in this manner, the logarithm 
of the likelihood (10.3) is 


(10.4) log z. = - y log 2® - AMog <T - Tj f (X. - 

The maximum of this function of the two variables n and a can be obtained by 
setting its partial derivatives (with respect to each of the variables) equal to zero 
and solving for them. Thus, 


(10.5) 


d(log L) 

dpi 

d(log L) 

da 


1 N 1 in \ 


N 1 " 

-+~ 3 ^ x ‘-^ 


and upon equating these to zero, and solving for fi and a 2 , the following maximum- 
* By getting the expected value of such an estimator, an unbiased statistic can be derived. 
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likelihood estimators are obtained: 


( 10 . 6 ) 


1 N 

* = nI x ’ = X 


These estimators of the universe pc and a 2 correspond to the well-known first and 
second moments of the sample. It should be noted that while the estimator /r is un¬ 
biased, the estimator b 2 is not. Actually, the expected value of this estimator is slightly 
less than the true parameter, viz., 

7 N - 1 2 

(10.7) E(b 2 ) = —— a. 


In other words, an unbiased estimate of a 2 (but not the maximum-likelihood estimate) 
is 


( 10 . 8 ) 


N 


--T I (*< - 

— 1 i = 1 


xf 


10.3. Maximum-Likelihood Estimates of Factor Loadings 

Instead of developing mathematical theory for the exact determination of the factor 
coefficients under such assumptions as those that led to the bi-factor, principal-factor, 
or minres solutions, the present approach is somewhat different. Under the assump¬ 
tion of a given number (m) of common factors, the method of maximum likelihood 
is applied to get estimators of the universe factor loadings from the sample of iV 
observations on the n tests. Subsequent (see 10.4) tests of significance can be applied 
to determine the adequacy of the hypothesis regarding the number of factors. 
Although the maximum-likelihood principle is relatively simple, the algebraic 
manipulations are not; therefore the detailed mathematical derivations will be 
omitted. 

First, the model to be used will be stated, together with the assumptions as to 
distributions. Specifically, the expression (2.9) for a variable in standard form is 
adopted, more generally, to the case of any variable in its respective unit of measure¬ 
ment, as follows: 

(10.9) Xj = a jl F l + a j2 F 2 + • • • + a jm F m + d } U j. 

Without loss of generality, it may be assumed that the test scores have zero means. 
It is further assumed that all factors F l ,F 2 ,--,F m , U U U 2 ,---,U„ are independent, 
normally distributed variables with zero means and unit variances. A consequence 
of this assumption is that the x’s have a multivariate normal distribution. 

The question of orthogonal or oblique factors is of no consequence here. The 
statistical estimation problem is concerned explicitly with the prediction of factor 
loadings for orthogonal factors out of consideration of the mathematical difficulties 
that would otherwise ensue. Once the (maximum-likelihood) common-factor space 
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= («jp) 

: (dj) 


Order 
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Definition 
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To make the P tobtem ‘”hl problem of determiningthe ^ distribution 
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distribution, the ac‘u^ str matkabk result was first 
can be determined. 7^ as follows: 

1928, and may be expresse * T 1 »_ ;i n a*... 


n 

y ff Jk Sj k T1 dSjk ’ 

Lt ^ ,-<k= l 


_a<n-i)iqi*< n_b_ ' ’ exp - 9 .rli j< k=:l 

10101 dF = 2 S . , i- t ; on to the determinants 

’ „t v at and n, and in additi employed for 

x «-“ == £»«, 

tsssstf*— .. 


* t0 ^ 6SUmateS £ = AA' + D 2 > livelihood function 

10 U) . . l To this end the logarithm (to the base 
which maximize L. l first obtained: 

for Wishart’s distr. u i . 1 faction independent of i 

N — L i_„ivi 4- > s jk | ^ 
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(10.16) 
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where, for simplicity, the caps have been removed from the estimates. Premultiolvins 
both sides of (10.18) by A'D 2 yields* & 

(10.19) (A'D~ 2 A + I)A' = AD~ 2 R, 
and defining 

(10.20) J = A'D~ 2 A 
can be put in the form 

(10-21) (1 + J)A' = A'D 2 R. 

Finally, this equation can be simplified to 
(10-22) JA' = AD" 2 R — A', 

which is amenable to an iterative method of solution. Equation (10.22) is employed 
along with (10.14) and (10.16), but the matrix J defined in (10.20) is required to be 
diagonal as an alternate condition to (10.17). The important effect of the simplified 
procedure is to replace the calculation of the inverse of a matrix of order n by one 
of order m—a tremendous saving since the number of common factors is usually 
much smaller than the number of variables. 

The iterative method for solving (10.22) for the factor loadings, due to Lawley 
[321, p. 181], will now be described in the matrix notation of the present chapter, 
and illustrated with a numerical example in 10.5. Since the factor loadings are com¬ 
puted one factor at a time, with succeeding factor weights being dependent on those 
already determined, it is convenient to designate the separate vectors of the factor- 
pattern matrix A, viz., 

( 10 - 23 ) A = (a t a 2 • • • aj, 

where any one of the m column vectors is given by 

(10.24) a p = {a Xp a 2p • • • a np } (p = 1,2, • • •, m). 

Corresponding to the trial values a p , the values derived from the iterative process 
are designated b p , with B for the complete pattern matrix and E 2 for the new unique¬ 
ness matrix. The iteration equations, exemplified by the following for the case of 
three factors, are immediately generalizable to any number of factors: 

b i = (RD~ 2 ai - aJA/alD-^RD-^ - a t ) 

b 2 = (RD“ 2 a 2 - a 2 - b 1 b , iD _2 a 2 )/ N /a 2 D~ 2 (RD _2 a 2 - a 2 - b^iD^aJ 

(10.25) b 3 = (RD “ 2 a 3 - a 3 - b^D^ - b 2 b^D- 2 a 3 )/ 

x/'a^D~ 2 (RD~ 2 a 3 - a 3 - b^lD 2 a 3 - b 2 b^D- 2 aJ 
E 2 = I — diag BB' 

* It is tacitly assumed that none of the uniquenesses vanishes. 
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To avoid further complications in notation, it is suggested that the b’s and E be 
replaced by a’s and D after each iteration before beginning the calculations anew. 
The computations by equations like (10.25) are repeated again and again until 
convergence is obtained to the desired degree of accuracy. The final matrix A contains 
the maximum-likelihood estimates of the factor loadings for the assumed number 
of common factors. 

A variant of the preceding method was recently developed and a computer 
program made available [204]. While still based on the assumption (10.18), it does 
not merely discard the off-diagonal values of J (in substituting (10.20) for (10.17) 
as the side condition) but actually diagonalizes this matrix. The essential part of the 
program can be described as follows: 

1. Start with an arbitrary factor matrix A* (usually the first m principal 
components). The reason for the subscript \ will become clear in the 
ensuing steps. 

2. D? = diag (I - A^A;..*), where i = 1,2,3, • • • is the iteration number. 

(10.26) 3. j/_x = A'^Dr 2 ^!, from definition (10.20). 

4. j. = according to (8.21). 

5. A, = Arotation of A i ^ i for which J, is diagonal. 

6. A i+i = (RD,“ 2 - I)A, Jr \ from (10.22). 

7. Test for convergence: \A i+i - A^l < e? 

In the foregoing algorithm the subscripts with the £’s designate intermediate 
matrices that are employed in the course of a particular iteration, which is represented 
by a subscript without a \ in it. The point of departure is step 4 in which the matrix J 
is diagonalized. This is accomplished, in the i th iteration, by means of the Spectral 
Theorem (8.21) which states that for Q t orthogonal (i.e., Q;Q[ = I), the symmetric 
matrix J*_ i is diagonalized by the transformation matrix Q* so that the new matrix 
j. is the diagonal matrix of eigenvalues of J;_a; and the columns of Q* are the 
(normalized) eigenvectors of J*_ a. After the new matrix of factor loadings is obtained, 
the test for convergence is applied. The loop, steps 2-7, is repeated until the maximum 
change in the difference of factor loadings is below some specified level (usually 
e = .001). 

Because the maximum-likelihood method has not been proved to converge (and, 
when it appears to converge it may be only to a relative maximum point), it is wise 
to apply an additional test. The likelihood function L must increase, of course, or, 
a negative fraction of it, as in (10.13), must decrease. The difference in this function 
of the likelihood from one iteration to the next is determined, and if it should fail to 
go down then there is evidence that the particular maximum point was missed, and 
the process has proceeded to another “mound.” One could back off and try different 
values for the matrix A in the preceding iteration, or proceed with the new values 
(for which the likelihood function failed to behave properly) and continue to seek 
convergence to a new relative maximum. Incidentally, the test of behavior of the 
likelihood function should not be confused with the convergence test in step 7 of 
(10.26). 
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Another promising new algorithm, and a computer program, has been developed 
by Hemmerle [223], In this paper, he indicates that in extracting 8 factors from a 
matrix of order 15 his method is almost five-fold more efficient than the original 
Lawley iteration scheme described above. 

Probably the most important recent work on the maximum-likelihood solution 
is that of Joreskog [288]. He has devised a very efficient computational method that 
converges rapidly, regardless of the values with which the process is started. Further¬ 
more, provision is made for the Heywood case. When the maximization of the 
likelihood function leads to one or more variables with uniquenesses essentially 
zero, an adjustment is made for such variables as follows: (1) the principal components 
of any such variables are obtained; (2) the remaining variables, with the effect of the 
recalcitrant variables partialed out, are analyzed by the maximum-likelihood method; 
(3) the two results are combined to give a complete maximum-likelihood solution for 
the total set of variables. Such a solution is discussed in 10.6. 

Before leaving the theoretical development of maximum-likelihood estimates of 
factor loadings, similar results obtained by another procedure should be noted. 
Rao [394] has developed a method—canonical factor analysis—primarily as an 
alternative to the principal factor method (chap. 8). Instead of asking that each factor 
account for as much as possible of the variation of the variables, he inquires for the 
factors which are maximally related to the variables. He solves this problem by a 
canonical correlation analysis of the hypothetical factors with the observed variables, 
employing the basic factor model (2.9). The results of this analysis are again maximum- 
likelihood estimates of factor loadings, but formally different from the earlier solution 
in this section. The canonical factor analysis method leads to an iterative procedure 
which Golub [160] has programmed for the Illiac. Even on such an electronic computer 
the process is slow. A numerical example solved by this procedure is given in 10.6. 

10.4. Test of Significance for the Number of Factors 

The particular statistical test which is the subject of this section is for the determina¬ 
tion of the number of significant common factors. Such a test is needed because 
implicit in the development of 10.3 is an assumption regarding this number. 
Fortunately, the problem of testing the hypothesis of a given number of factors has 
been solved for the case of maximum-likelihood estimates of the factor loadings 
based upon large samples. The solution rests on a theorem [see 369, p. 301] due to 
Wilks [523], which states that -2 times the logarithm of the likelihood ratio* is 
approximately distributed as x 2 when N is large. This statistic, following Anderson 
and Rubin [14, p. 136], may be put in the form, 

( 10 - 27 ) U m = —2 log 2 = N log—, 

|R| 

* The likelihood ratio is defined [369, p. 298] as the quotient of: the maximum of the likelihood 
function m a specified subspace with respect to the parameters (e.g., the space of the m common 
factors, with dimension nm - m[m - l]/2); to the maximum of the likelihood function in the 
total region with respect to the parameters (e.g., the total space of the n variables, with dimension 
n[n - l]/2). 
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with base e for the logarithms. It should be noted that the likelihood ratio X depends 
only on the sample observations—the sample correlation matrix and the estimate 
of the universe correlation matrix under the hypothesis H 0 of m factors. Since the 
likelihood ratio varies from 0 to 1 , the expression in ( 10 . 27 ) increases as X decreases, 
approaching infinity as X approaches zero, and the critical region for U m is the 
right-hand tail of the x 2 distribution. The test procedure, then, is to reject the hypo¬ 
thesis H 0 of m factors if the value of U m exceeds x 2 for the desired significance level; 
otherwise it is accepted. If the hypothesis is rejected, an alternate hypothesis of some 
larger number of factors may be assumed to explain the observed correlations. 

In applying the foregoing test, it is necessary to know the degrees of freedom 
associated with the x 2 distribution. This number is the difference between the 
dimensionalities of the two regions (with respect to the parameters) involved in the 
likelihood ratio. For the hypothesis H 0 of m common factors, the number of degrees 
of freedom is given by 

(10.28) v = £[(n - mf - n - m\ 

For computing purposes, formula (10.27) can be simplified. By formally evaluating 
the ratio of the two determinants and making a series of approximations [see 320, 
p. 80], this expression can be reduced to 

(10.29) U m = N Z rjjjdjdl 

j<fc=i 

where, as previously defined, the residuals are given by: 

^jk = r jk ~ r jk ? 

and the reproduced correlations r' jk are the elements of P. 

For moderate-sized samples, Bartlett [28, pp. 82, 84] proposes the following 
multiplying coefficient as more accurate than the N in (10.27): 

(10.30) N - n/3 — 2m/3 - 11/6. 

Such a modification can also be employed in the computing formula (10.29). It 
should be remembered that the U m statistic was developed for large-sample tests. 
While a slight correction for a moderate-size sample may be in order, the coefficient 
(10.30) cannot make the statistic any more appropriate when N is relatively small. 
The suggestion in this text is that the tests be employed only for large samples. 

The foregoing test makes it possible to determine the statistical significance of the 
assumed number m of common factors at a given level of confidence. If m* is the 
true number of common factors then it is to be expected that by selecting m ^ m a 
good fit will be obtained, while for m < m* the fit will be poor. By this is meant that 
U m for the former case will be small and for the latter case it will be large. A reason¬ 
able procedure is to select a confidence level, say 5%, and take as the estimate of 
the number of factors the smallest value of m which yields a nonsignificant U m when 
compared with x 2 &t this level. Usually one would take sequential values of wi, 
starting with a subjective judgment of the smallest number of factors required, 
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making the test at each stage, and increasing m by one when U m is significant, until 

be nmed°th»t‘ S f WhiCh * he corres P ondin * V m is nonsignificant. It should 

be noted that under the foregoing procedure the exact probability of concluding that 

hypotheses m- fT™ - FT *’* hiS f joint of ‘he several 

ypotheses m - 1,2, ■ • ■, m is less than (or equal to) the probability of rejecting the 

single hypothesis m = m*, at a given significance level. 

The techniques described in this chapter are scientifically sound and powerful 

They are also costly m regard to computing effort. Do they offer a more general 

guide for factor analysis? There is evidence that the test of significance for the number 

of common factors has applicability to situations where the maximum-likelihood 

method is not employed. Thus, if for any factorization the function U m is computed 

and found to be insignificant when considered as a * 2 then the particular factor solu- 

10n assumed to contam the correct number of common factors and have an 

acceptable pattern of factor loadings. On the other hand, if it is found to be significant 
no conclusion can be reached. It might still be true that a maximum-likelihood 
factoriza ion with the same number of common factors would give an acceptable 
statistical fit. In other words, the test provides an upper limit for the number of 
common factors unless the factor loadings are estimated by an efficient method 
Before leaving this test, a comparison with the corresponding test in 9 5 is in 
order. There, it will be recalled, the statistic (9.31) was developed without specifying 
cient estimates of the factor loadings (as in the present case of maximum-likelihood 

' A consequence of thls 18 that the number of degrees of freedom given by 
9.32) is n greater than that given by (10.28). Hence the power* of the test in 9 5 is 
lower than m the present instance. 

In addition to the mathematical contributions of Lawley and others, an empirical 
emonstration of the validity of the test has been provided by Henrysson [2251 
Working with a table of random numbers from a normal population, he built 12 
samples of 9 variables each containing 200 observations. After computing the 
variance-covariance matrix for each sample, he proceeded to estimate the factor 
loadings by the method of maximum likelihood under the assumption of m = 1 
and to calculate l/i by (10.29). For 27 degrees of freedom, the 12 values of U x ranged 
from 15 .5 to 38.6 with associated * 2 probabilities from .96 to .07. Assuming the values 
o to be distributed rectangularly, the empirical results agree very well with the 
expected range of .85. Furthermore, upon combining the twelve samples, only one 

common factor was found to be significant for the explanation of the artificial data 
as hypothesized. 

10 . 5 . Computing Procedures 

While it is not expected that the maximum-likelihood method will be employed 
to estimate factor weights without very powerful computing facilities, nonetheless 
an outline of the procedure may be very useful. This can serve two purposes: (1) as 

* By the power of a statistical test of an hypothesis is meant the probability that it rejects the 
alternative hypothesis when that alternative is false. The power is greatest when the probabiSv 
of an error of the second kind (i.e., accepting a false hypothesis) is least. y 
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a means of grasping the technique for a small number of variables and factors and 
(2) as a basis for writing a computer program for handling larger problems according 
to (10.25). In the following outline, the computing procedures are illustrated by means 

of the example of eight physical variables of 5.4. . • 

1. Organization of data.-The complete correlation matrix (with unities in the 
diagonal) constitutes the basic data. For the example, the correlations from Table 5.3 
are repeated in the matrix R in Table 10.2. 


Table 10.2 


Correlation Matrix for Eight Physical Variables: R 


1 

2 

3 

4 

5 

6 

7 

8 


1 

2 

3 

" 1.000 

.846 

.805 

.846 

1.000 

.881 

.805 

.881 

1.000 

.859 

.826 

.801 

.473 

.376 

.380 

.398 

.326 

.319 

.301 

.277 

.237 

.382 

.415 

.345 


4 

5 

6 

.859 

.473 

.398 

.826 

.376 

.326 

.801 

.380 

.319 

1.000 

.436 

.329 

.436 

1.000 

.762 

.329 

.762 

1.000 

.327 

.730 

.583 

.365 

.629 

.577 


7 

.301 

.277 

.237 

.327 

.730 

.583 

1.000 

.539 


8 

.382' 

.415 

.345 

.365 

.629 

.577 

.539 

1.000 


2. Hypothesis regarding number of common factors (ra).—An assumption regarding 
the number of common factors must be made at the outset. This hypothesis is made 
on the basis of all available information, including any previous studies involving 
the given variables. The eight physical variables have been analyzed by many different 
methods and different investigators, but always with the same conclusion of two 
common factors. Hence, the immediate thought was to assume m = 2. Maximum- 
likelihood estimates were obtained under this assumption but a test of this hypothesis 
showed that at least three factors are statistically significant (see exs. 10 and ). 
Therefore three common factors are assumed in the following calculations (and, as 
will be noted below, even this number is not sufficient). With m = 3 all ramifications 
of the computing procedures are exemplified, making the extension to a larger 

number of factors easy. , . c 

3. Initial set of trial values.— The iterative procedure, which ultimately leads to 
the maximum-likelihood estimates, is begun with some arbitrary first approximations 
to the factor loadings. While almost any set of numbers may do, the closer the approx¬ 
imation to the actual weights the faster will the process converge. Usually some 
arbitrary factor analysis of the data, employing any of the methods described in 
earlier chapters, will do for the initial set of trial values.* As indicated above, an actual 
maximum-likelihood solution under assumption of m = 2 had been c ° n JP ut <f 
which the initial set of trial values came from a principal-factor solution). Hence, tor 
the present needs, the trial values for the first two factors are taken to be the previously 

* An initial minres solution has been found to hasten the convergence process. 
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Table 10.3 

Maximum-Likelihood Estimates of Three Factors for Eight Physical Variables 


Line 

Instruction 


Variable (y) 





1 

2 

3 

4 

5 

6 

7 

8 

1 

o 

a n 



.894 

.896 

.866 

.871 

.656 

.554 

.490 

.551 


a j2 



-.182 

-.301 

-.291 

-.209 

.663 

.577 

.587 

403 

D 

U j3 



-.070 

.078 

.011 

-.101 

-.148 

-.051 

-.292 

.592 

4 


-L\- 

l\ 

.1627 

.1005 

.1652 

.1875 

.1082 

.3576 

.3301 

.1835 

5 

LJLt 



5.495 

8.915 

5.242 

4.645 

6.083 

1.549 

1.484 

3 003 

7 

l 3 /l a 



-1.119 

-.430 

-2.995 

.776 

-1.762 
.067 

-1.115 

-.539 

6.128 

-1.368 

1.614 

-.143 

1.778 

-.885 

2.196 

3.226 

8 

9 

rl 5 

L 8 — 



26.326 

25.432 

26.461 

25.565 

25.426 

24.560 

25.662 

24.791 

20.183 

19.527 

17.060 

16.506 

15.316 

14.826 

17.813 

17.262 

10 

a jl = Lg/^J L S 

■Lg 


.883 

.888 

.853 

.861 

.678 

.573 

.515 

.599 

11 

12 

13 

rl 6 

~ ^2 

■^12 ~ 1 {L(, ■ L l0 )L l0 


-1.114 
-0.932 
-1.992 

-2.181 

-1.880 

-2.946 

-2.172 

-1.881 

-2.905 

-1.375 

-1.166 

-2.199 

7.226 

6.563 

5.749 

6.238 

5.661 

4.973 

6.426 

5.839 

5.221 

5.255 

4.852 

4.133 

14 

a j2 — L l3 /y/~L ( 

5 ' -^13 


-.223 

-.330 

-.325 

-.246 

.644 

.557 

.585 

.463 

15 

16 

17 

18 

rl 7 

Lis — L 3 

L i6 — (L 7 • L 1C 
L\i — (L 7 • L 14 

>)Lio 

■)L U 


.079 

.149 

-.177 

-.185 

.559 

.481 

.153 

.142 

.310 

.299 

-.016 

-.027 

.031 

.132 

-.186 

-.194 

-.215 

-.067 

-.318 

-.296 

.086 

.137 

-.075 

-.056 

-.303 

-.011 

-.201 

-.181 

1.790 

1.198 

.977 

.993 

19 

®j3 = L ls /yJ~L 7 

' -^18 


-.092 

.070 

-.013 

-.096 

-.147 

-.028 

-.090 

.492 

20 

= 1 - JLio - 

/ 2 - 
a -'14 

L\g 

.1621 

.0977 

.1666 

.1889 

.1040 

.3606 

.3844 

.1848 


(Thn 

se iterations omitted in order to 

conserve space) 




69 

70 

71 

Lss/L 63 

LftzIL^ 

L6l/L 6s 

5.503 
-1.604 
-.642 

9.338 

-3.906 

.875 

4.994 

-2.111 

-.077 

4.446 

-1.459 

-.481 

8.099 

7.171 

-1.661 

1.635 

1.464 

-.077 

1.274 

1.307 

-.084 

3.068 

2.155 

2.461 

72 

73 

RL 69 

L?2 — L 58 



27.279 

26.404 

27.270 

26.395 

26.172 

25.332 

26.492 

25.642 

22.155 

21.448 

18.610 

18.016 

16.674 

16.142 

19.115 

18.503 

74 

LlJyJ L 69 * L 73 


.874 

.874 

.838 

.849 

.710 

.596 

.534 

.612 

. 75 

76 

77 

RL 70 

■^75 ~ ^ 62 

■^76 — ILlO ' L 74 )L 74 


-2.670 

-2.415 

-2.553 

-3.898 

-3.532 

-3.670 

-3.767 

-3.412 

-3.544 

-2.932 

-2.653 

-2.787 

6.931 

6.305 

6.193 

5.869 

5.337 

5.243 

6.015 

5.469 

5.385 

4.720 

4.290 

4.193 

78 

a j2 = L 11 / S J L 10 ■ L 77 


-.258 


-.358 

-.281 

.625 

.529 

.544 

.423 
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Table 10.3 ( continued) 


Line 

Instruction 

Variable (j) 

1 2 3 4 5 6 7 8 

79 

80 

81 

82 

83 

RL 7 j 

L 79 — L 6 7 

Lgo — {Lji • L 74 )L 74 

L 81 — L 71 • L 78 )L 78 

a j3 ~ l S2 /\Jl 7 i • l 82 

- 278 .215 -.035 -.250 -.447 -.125 -.141 1.242 

— 176 1.133 -.022 -.158 -.302 -.097 -.106 .751 

_ 147 162 .006 -.130 -.279 -.077 -.088 .771 

_; 167 . 133 —.022 -.152 -.230 -.035 -.045 .804 

-.102 .081 -.013 -.093 -.141 -.021 -.028 .493 

84 

dj — 1 — L 74 — L 78 — L 83 

1592 .0927 .1694 .1916 .0854 .3645 .4181 .2035 


computed maximum-likelihood weights. For the third factor, the trial values come 
from the third eigenvector of a principal-factor solution that was readily available. 
These initial factor loadings appear in the first three rows of Table 10.3 (they are 
written in rows rather than columns simply for convenience of printing). The unique¬ 
nesses are computed from these values and appear in line 4. The entries in the first 
four lines of Table 10.3 constitute the initial set of trial values. . 

4. Factor weights divided by uniquenesses. —The calculations implied in the iteration 
equations (10.25) are organized systematically in this and the following four steps, 
without always translating the matrix algebra explicitly into equivalent arithmetic 
operations. It will be noted that in getting the second approximation for each vector 
of factor loadings, the original vector (a l9 a 2 , or a 3 ) is multiplied by the inverse (D ) 
of the diagonal matrix of uniquenesses. This is equivalent to dividing the factor load¬ 
ings of a variable by its uniqueness. These results appear in lines 5, 6, 7 of Table 10.3. 

5. Determination of next trial values of first-factor weights. —The first of equations 

(10.25) is reduced to convenient form for arithmetic computation in lines 8, 9, 10, 
with the first term becoming 

RD“ 2 a 1 = RL 5 , 

which is recorded as L 8 . It should be perfectly clear when expressions that are really 
column vectors are recorded in rows for ease of printing, and transposes are not 

indicated to avoid pedantry. . 

The numerators for the second approximation to the first-factor weights are 
simply obtained in L 9 . The denominators are more involved. Considering L 5 and 
La as vectors, the expression under the radical is the dot product of these vectors as 
defined in 3.2 paragraph 19. The square-root of this dot product is a constant by 
which each entry in line 9 is divided to get the next approximation b n according to 

(10.25) . Actually, these factor loadings are again written as to simplify the notation. 

6. Determination of next trial values of second-factor weights. The next approx¬ 
imations to the second-factor loadings are provided by the second of equations 
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(10.25). The rather imposing matrix manipulations are reduced to straightforward 
calculations m lines 11-14 of Table 10.3. The work is quite similar to that of para¬ 
de 5 fi eX f r/ 0r the a k ddlt ‘° u nal step which corrects for the previously revised values 
of the first-factor weights. This correction is the term 

M'iD 2 a 2 

from (10.25). Translated into terms of Table 10.3, this becomes 
L ioL'u)L 6 or LgL 10 L' 10 (transposed), 

wherein careful attention is paid to the designation of the elements as column vectors 

inneaTs inX S n° r T VeCt ° rS (with P r j mes > Writte > -imply, the correction 
appears in line 13. The remaining calculations leading to b i2 (but written a- 1 are 

essentially the same as in the case of the first factor. j2 

7 Determination of next trial values of third-factor weights.—The third-factor 
loadings are calculated in lines 15-19 of Table 10.3 on the basis of the third of the 
l era ion equations (10.25). Again, while the matrix algebra appears very imposing 
the actual calculations follow the same pattern as in the case of the first and second 
factors There is now a correction for the effect of the new first-factor weights (in 
17 ) and also a correction for the new second-factor weights (in L 18 ). If there were 
more than three common factors there would be additional such correction terms for 
each of the previously computed factors. 

8. Determination of next trial values of uniquenesses.—After the new approxima¬ 
tes to the maximum-likelihood estimates of the three sets of factor loadings have 

cuTted1n r i?ne n 20 m ^ ' 4> ' 9 ’ ValueS ° f the <*1- 

9. Convergence of factor weights.-The calculations involved in paragraphs 4-8 
are repeated until the factor loadings in successive iterations do not change (to a 
designated number of decimal places). In the illustrative example, successive values 
of all corresponding factor loadings agreed to within .007 after 5 iterations This is 
not sufficiently accurate, and is accepted here only for illustrative purposes. Such 

“ HneS 74 ’ 78 ’ 83 ’ With the aSSOCiated in 

10 Residual matrix.-To assist in calculating the residuals, the reproduced 

onaT TU° nS ^ firs ‘'^etermmed according to (2.50) since the factors are orthog- 

from ,1 aPPea a m 1 UPPer half ° f Table 10A Then ’ subtracting these values 

^Ure obraTnTd 0 7 77 C " re , lations of Table la2 > the residual correlations 

[r Jk ) are obtained and recorded in the lower half of Table 10.4. 

M. Statistical test of hypothesis regarding number of factors.—The hypothesis of 

m - 3 for the illustrative example can now be tested. To assist in calculating the 

statistic as given by (10.29), the intermediate elements are computed in Tabled 5 

The sum of the 28 elements below the diagonal is .15098, and upon multiplying this 

by the number of cases N = 305 the resulting statistic is U 3 = 46.0. The degrees of 

freedom as grven by (10.28), is v = 7. By referring to Table D in the Appendix! wfil 

be found that for 7 degrees of freedom a value of = 18.5 produces a probability 
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Table 10.4 


Reproduced Correlations and Residuals 


Variable 

1 

1 


2 

-.005 

3 

-.021 

4 

.035 

5 

-.001 

6 

.011 

7 

-.028 

8 

.007 


2 3 


4 5 


6 


7 8 


.851 

.017 

.826 

.864 

-.012 

-.012 

-.002 

.007 

.003 

.009 

.014 

-.016 

-.003 

-.010 


.824 

.474 

.838 

.378 

.813 

.373 


.440 

-.004 


-.030 

.005 

.024 

.007 

.010 

.000 


.387 

.329 

.375 

.323 

.263 

.418 

.310 

.253 

.355 

.359 

.303 

.355 

.757 

.723 

.629 


.607 

.578 

-.024 

-.001 

-.004 

.543 


a Reproduced correlations appear in the upper 


triangle and the residuals in the lower triangle. 


Table 10.5 

Elements for Calculation of x 2 Test 3 


Variable 

1 2 

1 

6.2854 .00002 

2 

.00136 10.7875 

3 

.01633 .01847 

4 

.04000 .00788 

5 

.00000 .00000 

6 

.00207 .00030 

7 

.01173 .00516 

8 

.00154 .00053 


1 


3 4 5 


.00044 .00122 .00000 

.00029 .00014 .00000 

5.9032 .00014 .00005 

00431 5.2192 .00002 

.00346 .00122 11.7096 

.00130 .01289 .00064 

.00367 .00724 .00140 

00290 .00256 .00000 


.00012 .00078 .00005 

.00001 .00020 .00001 

00008 .00026 .00010 

.00090 .00058 .00010 

.00002 .00005 .00000 

2.7435 .00058 .00000 

.00381 2.3918 .00002 

.00000 .00024 4.9140 


a The elements in this table consist of the following: (1) Above 
tions; (2) In diagonal: 1 Id), reciprocals of uniqueness; (3) Below diagonal. r jk ldjd k , elements whicn 

summed to get the function (10.29). 


p = oi Therefore, if the hypothesis of only three common factors is correct, the 
probability of getting a value of x 2 > 18.5 is only 1 in 100. Since the actual value 
46 0 is considerably in excess of 18.5, the hypothesis is rejected and it must be assumed 
that more than three common factors are required for adequate e *P l *“f 10n 
observed data. This is the conclusion on purely statistical grounds, but it is doubtfu 
if a factor analyst or applied statistician would look for any more (l»actical) re ‘ a '°"' f 
ships among the variables from the third-factor residuals in the lower half ol 

Table 10.4. 


10.6. Numerical Illustrations 

It should be quite evident from the work in the preceding section that the 
maximum-likelihood method for estimating factor weights requires tremendous 
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computations. While this method implied insuperable computing difficulties in the 
past—causing it to lay dormant for almost two decades—the modern electronic 
computers make the method feasible. Still, there have been few numerical examples 
reported in the literature. 

Additional problems will be considered in this section, but first a few more com¬ 
ments about the example of the eight physical variables. It came as a surprise to find 
that two factors inadequately reproduced their intercorrelations after the many 
crude guides pointed to precisely two clearly distinguishable clusters. What was even 
more surprising, then, was the large U m for the hypothesis m = 3, indicating that a 
fourth factor really should be sought. At this point it was only natural to question 
the numerical calculations, and finding them to be accurate, to question the theoretical 
basis. One area of doubt was the approximations in going from formula (10.27) to 
the simpler computing formula (10.29). To this end, the determinants of the observed 
correlations and of the reproduced correlations (with two and with three factors) 
were calculated. These values were then employed in formula (10.27) to get U 2 = 69.9 
and t/ 3 = 51.0 for the two-factor and three-factor cases, respectively. The correspond¬ 
ing values by use of formula (10.29) are 80.2 and 46.0. Of course the numbers differ 
a little, but the same conclusions would be drawn from the results of either formula 
—rejecting the hypothesis of two factors in the first case, and of three factors in the 
second case. This analysis lends credence to the use of the simpler formula. 

In the course of experimenting with different maximum-likelihood procedures, the 
example of eight physical variables was used extensively. Such computer programs 
included two provided by Dr. Donald F. Morrison (when he was associated with the 
National Institutes of Health); a procedure* of Bargman [23] which is based on the 
model developed by Howe [268]; and the program outlined in (10.26). The two most 
interesting results are presented in Table 10.6. The first of these solutions was obtained 
by means of a program made available by the National Institutes of Health, starting 
with the first three principal components and converging to a maximum difference 
of .0005 in factor loadings of successive iterations. Solution 2 was obtained by means 
of a program [204] based upon (10.26), with the minres solution of Table 9.3 (for 
m = 3) as the initial approximations to the factor loadings. Convergence in the factor 
weights to within .005 was reached in about one minute on a Philco 2000 computer. 
It will be noted that these two solutions, although in canonical form, differ from each 
other. Furthermore, they are different from the crude results in Table 10.3. The 
disparate nature of the two solutions can be seen immediately from the wide discrep¬ 
ancies in factor loadings and communality for variable 8. No matter how similar the 
other numbers might appear, the complete solutions could not be the same in the 
light of the different fit to one of the variables. Diverse maximum-likelihood solutions 
for the same data arise, in part, because of the different standards for convergence- 
only 5 iterations when calculated by hand versus hundreds of iterations in one 
experimental run on a computer—but also because a different relative maximum may 

* Actually a step-wise process in which each factor in turn is estimated by maximum-likelihood, 
but the result for a set of several factors is not the same as a maximum-likelihood solution for the 
specified number of factors. 
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Table 10.6 

Two Maximum-Likelihood Solutions for Eight Physical Variables 
(Three common factors) 


Variable 

j 


Solution 1 



Solution 2 


Ft 

f 2 

f 3 

h J 

Ft 

f 2 

f 3 

h] 

1 

.845 

.340 

.065 

.834 

.862 

.320 

.164 

.873 

2 

.841 

.434 

-.084 

.902 

.867 

.432 

-.243 

.998 

3 

.806 

.428 

.006 

.833 

.808 

.388 

-.051 

.807 

4 

.819 

.360 

.047 

.802 

.833 

.343 

.180 

.843 

5 

.764 

-.560 

.272 

.971 

.748 

-.584 

.090 

.910 

6 

.629 

-.452 

.106 

.610 

.625 

-.499 

.007 

.640 

7 

.568 

-.471 

.116 

.558 

.569 

-.515 

-.020 

.589 

8 

.678 

-.461 

-.542 

.966 

.603 

-.344 

-.164 

.509 

V p 

4.503 

1.569 

.405 

6.477 

4.481 

1.532 

.156 

6.169 


be sought, depending on the starting point in the iteration process and other aspects 
of the algorithm. 

It is of interest to note the statistical inference that might be drawn from each of 
the solutions. The solution in Table 10.3, based on only 5 iterations on a desk cal¬ 
culator, is understandably crude, and the statistical test in 10.5, step 11, led to the 
conclusion that three common factors were not sufficient to explain the observed 
data. For solution 1 of Table 10.6, the statistic (10.27) becomes 

U 3 = 305 log e (.00113253)/(.00096740) = 48.1, 

and since this is even larger than the value of 46.0 obtained in the preceding case, the 
same conclusion is reached. Solution 2, however, leads to a somewhat different 
inference. For that solution, the determinant of reproduced correlations is reduced 
to .00104685 and the associated statistic U 3 = 24.1. While this value is greater than 
X 2 = 18.5 with a probability P = .01 for v = 7 degrees of freedom, it does not exceed 
X 2 = 24.3 for P = .001. Therefore, solution 2 may be accepted at the 0.1 per cent 
level of significance. 

The new maximum-likelihood procedure developed by Joreskog [288] was applied 
to the problem of eight physical variables. Four different sets of initial values for the 
uniquenesses were attempted (including those from the two different solutions of 
Table 10.6), and identical final results were obtained in each instance. No matter 
what starting values, the program determined that the uniqueness for variable 2 
would have to be zero to attain a maximum. The effect of that variable was partialed 
out, and two maximum-likelihood factors were obtained from the matrix of partial 
correlations among the remaining seven variables. The latter solution is then com¬ 
bined with the principal component through variable 2 to yield the maximum- 
likelihood solution for the eight variables in terms of three factors. Since the result 
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is not in canonical form, the individual factor weights are not compared with those in 
Table 10.6. However, it is of interest to note that the communalities: 

.873, 1.000, .806, ,844, .910, .641, .589, .509, 

are almost identical with those of solution 2. Also, the test of goodness of fit of the 
new solution leads to the same conclusion as in the case of solution 2. It would 
appear from Joreskog’s work that solution 2 is the true maximum-likelihood solution 
while the procedure that led to solution 1 had not really converged. 

This example illustrates the general principle that one tends to underestimate the 
number of factors that are statistically significant. For twenty years, two factors had 
been considered adequate, but statistically two factors do not adequately account 
for the observed correlations based on a random sample of 305 girls. However, the 
third factor (whose total contribution to the variance ranges from 2 per cent to 5 per 
cent for the different solutions) has little “practical significance,” and certainly a 
fourth factor would have no practical value. 

The next illustration is for the 5 socio-economic variables introduced in 2.2. In 
making a choice of hypothesis regarding the number of common factors, one, two, 
or three, all appear somewhat reasonable, but the selection of m = 2 is more in 
keeping with the previous work. Two different computer programs were employed 
on a Philco 2000, and with three different covergence standards (maximum change in 
factor loadings required to be less than .0005, .00001, and .000005) all the results were 
almost identical to three decimal places. In Table 10.7 the solution is shown in the 
arbitrary form produced by the basic maximum-likelihood program, and also after 
it has been put in the canonical form of 8.8. 

The striking similarity between this solution and the minres solution of Table 9.2 
may be noted. Also, for this example, the maximum-likelihood solution is almost 
identical with the principal-factor solution of Table 8.13. The total communality 


Table 10.7 

Maximum-Likelihood Solution for Five Socio-Economic Variables 
(Two common factors) 


Variable 

j 

Arbitrary Form 

Canonical Form 

' .. i 

Communality 

Ft 

f 2 

Ft 

f 2 

h J 

1 

.999 

-.008 

.621 

.783 

.998 

2 

.019 

.899 

.711 

-.550 

.809 

3 

.974 

.109 

.697 

.689 

.961 

4 

.446 

.785 

.891 

-.147 

.815 

5 

.030 

.960 

.766 

-.580 

.923 

V p 

2.147 

2.360 

2.759 

1.748 

4.507 
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produced by the maximum-likelihood solution represents 90 per cent of the total 
variance of the five variables. This is a very exceptional situation that should not 
ordinarily be expected. Of course, even this high proportion must be less than the 
amount accounted for by the first two principal components (see Table 8.1). While 
the contributions of the factors and the total communality is of importance in both 
the minres and maximum-likelihood solutions, it must be remembered that these 
methods have other primary objectives than maximizing variance, which is the case 
only in the principal-factor methods. 

A fair sized problem in terms of maximum-likelihood factors is shown in Table 10.8. 
This is the problem of thirteen psychological tests for which several solutions in 
terms of three factors have been obtained previously. The calculations for Table 10.8 


Table 10.8 

Maximum-Likelihood Solution for Thirteen Psychological Tests 
(Five common factors) 


Test 

j 

Pattern Coefficients 

Uniqueness 

dj 

Communality 

hj 


f 2 


T 4 . 

Fs 

1 

.492 

.171 

.512 

-.058 

.056 

.460 

.540 

2 

.305 

.042 

.292 

-.228 

-.049 

.766 

.234 

3 

.347 

-.024 

.544 

-.034 

-.150 

.559 

.441 

4 

.425 

.010 

.392 

.198 

-.020 

.626 

.374 

5 

.817 

-.122 

-.140 

-.251 

-.171 

.206 

.794 

6 

.795 

-.241 

-.029 

.068 

.233 

.250 

.750 

7 

.812 

-.217 

-.082 

.239 

-.032 

.229 

.771 

8 

.716 

-.008 

.085 

.149 

-.186 

.423 

.577 

9 

.799 

-.283 

-.089 

-.083 

.053 

.264 

.736 

10 

.419 

.622 

-.359 

.086 

-.055 

.298 

.702 

11 

.488 

.474 

-.057 

-.173 

.311 

.407 

.593 

12 

.367 

.663 

.015 

.063 

-.132 

.404 

.596 

13 

.550 

.476 

.266 

.003 

.075 

.395 

.605 

V 

4.597 

1.510 

1.042 

.291 

.273 

— 

7.713 


were made on the Illiac, employing a canonical factor analysis program. As noted in 
10.3 such an analysis leads to maximum-likelihood estimates of factor loadings. Tests 
for the number of common factors show five factors significant at the 5 % level and 
three factors significant at the 1 % level. The latter agrees with the subjective judgments 
made earlier regarding the number of factors for practical consideration. 

A large-scale application of the maximum-likelihood method was made by Lord 
[342] in 1956, in a study of speed factors in tests and academic grades. Using Whirl¬ 
wind I, an early electronic computer, he analyzed 33 variables into 10 maximum- 
likelihood factors. Lord originally hypothesized at least 9 common factors, but he 
could not arrive at such a solution directly because imaginary numbers were produced 
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for the factor loadings. He found that the method broke down unless extremely close 
initial approximations to the actual solution were available whenever m was large. 
Hence, he carried out the computations in stages: starting with m = 4 and somewhat 
arbitrary trial values, he obtained maximum-likelihood estimates of the factor load¬ 
ings ; then he took the hypothesis of m = 5, using the previously determined weights 
for the first four factors and guesses based upon the residuals for the fifth; proceeding 
in this manner he was able to obtain ten factors without encountering further diffi¬ 
culty. Tests for the number of common factors produced a significant y 2 , with a 
probability well below the 1 % level, in each instance from m = 4 through m = 9. 
However, the x 2 dropped sufficiently so that ten factors were found to be significant 
at the 7 % level. 

The final example to be considered consists of the twenty-four psychological tests 
whose correlations are given in Table 7.4. Rather than attempting to get a maximum- 
likelihood solution directly from the correlation matrix, with initial trial values of 


Table 10.9 

Maximum-Likelihood Solution for Twenty-four Psychological Tests 
(Four common factors) 


Test 

j 

Ft 

f 2 

f 3 

^4 


1 

.601 

.019 

.388 

.221 

.561 

2 

.372 

-.025 

.252 

.132 

.220 

3 

.413 

-.117 

.388 

.144 

.356 

4 

.487 

-.100 

.254 

.192 

.349 

5 

.691 

-.304 

-.279 

.035 

.648 

6 

.690 

-.409 

-.200 

-.076 

.689 

7 

.677 

-.409 

-.292 

.084 

.718 

8 

.674 

-.189 

-.099 

.122 

.515 

9 

.697 

-.454 

-.212 

-.080 

.743 

10 

.476 

.534 

-.486 

.092 

.757 

11 

.558 

.332 

-.142 

-.090 

.450 

12 

.472 

.508 

-.139 

.256 

.566 

13 

.602 

.244 

.028 

.295 

.510 

14 

.423 

.058 

.015 

-.415 

.354 

15 

.394 

.089 

.097 

-.362 

.304 

16 

.510 

.095 

.347 

-.249 

.451 

17 

.466 

.197 

-.004 

-.381 

.402 

18 

.515 

.312 

.152 

-.147 

.407 

19 

.443 

.089 

.109 

-.150 

.238 

20 

.614 

-.118 

.126 

-.038 

.408 

21 

.589 

.227 

.057 

.123 

.417 

22 

.608 

-.107 

.127 

-.038 

.399 

23 

.687 

-.044 

.138 

.098 

.503 

24 

.651 

.177 

-.212 

-.017 

.501 

Variance 

7.643 

1.681 

1.229 

.911 

11.464 
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principal components, it was decided to start with a minres solution. Assuming m = 5 
and starting with the solution of Table 9.6, a maximum-likelihood solution was 
reached in just ten iterations and less than five minutes on a Philco 2000. However, 
this solution contained a Hey wood case, the communality of variable 19 being greater 
than one. Since a simple procedure is not available which will force the maximum- 
likelihood communalities within the permissible range, it was decided to start over, 
using a minres solution with m = 4 for the initial matrix (note the experience with 
this problem in 9.7). Again, a maximum-likelihood solution was reached in ten 
iterations and in less than five minutes. This solution, in canonical form, is shown in 
Table 10.9. The maximum-likelihood solution is almost identical with the minres 
solution: the communalities agree to within a unit in the second decimal place (only 
for test 11 is the difference .020); the differences in the contributions of factors are 
.003, .009, .011, and .005, respectively; and the 96 individual factor loadings, with 
only a few exceptions, differ mostly in the third decimal place. 
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11 . 1 . Introduction 

In the preceding three chapters, methods were presented for the direct determina¬ 
tion of a factor solution for a given correlation matrix. A common attribute of those 
solutions was the orthogonality of the factors. In the present chapter another direct 
method for factoring a correlation matrix is developed, but the resulting factors 
usually are oblique to one another. The method of this chapter is essentially in the 
spirit of the simple factor models of chapter 7. The resulting solution may serve 
either as a final form or as a stepping stone for transformation to the multiple-factor 
solution (see part iii). 

The basic concepts involved in the multiple-group analysis are presented in 11 . 2 . 
This is followed by the theoretical development of the oblique solution in 11.3 and 
the orthogonal solution in 11 . 4 . While the formulas developed in these sections are 
certainly necessary to the understanding of the method, they do not provide a ready 
means for proceeding with the analysis. A schematic arrangement of the work, with 
an outline of the steps in the computation, is given in 11 . 5 . Finally, in Section 11.6 
the complete numerical work is performed on an example of nine psychological 
variables involving three common factors. 

11 . 2 . Concepts and Notation 

The concept of group factors in factor analysis should not be confused with the 
group-factor method for reducing a correlation matrix to a factor matrix which 
satisfies the fundamental theorem (2.46). During the rapid development of factor 
analysis in the 1930’s, several workers dealt with group factors. Notable among 
these were Cyril Burt, who considered the “group factor method”; R. C. Tyron, 
who proposed “cluster analysis”; and K. J. Holzinger, who developed the “bi-factor” 
method of analysis. While these methods certainly involve the group factor concept, 
they are not specifically in the spirit of factoring a matrix in terms of multiple factors 
concurrently. 
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Horst [250] in 1937 anticipated the multiple-group method of factor analysis, 
but he did not carry the theoretical work to the stage of practical application. 
Similarly, in 1944 Guttman [173] presented the theory without computational 
procedures. It was not too surprising that Holzinger [237], in the same year, and 
Thurstone [474], a year later, presented simple computing procedures for “group 
factor analysis,” without recognizing the similarity to the earlier theoretical develop¬ 
ments. The several independently developed “multiple group methods” for factor 
analysis are compared and synthesized by Harman [198]. 

The basic distinction of the multiple-group method is the ability to extract a number 
of common factors in one operation. In otder to maximally capitalize on the power 
of this method, it is necessary to select a number of linearly independent groups, 
which number approximates the rank of the reduced correlation matrix. This property 
has a function similar to the grouping of variables in the bi-factor method. 

Except in rare circumstances, the common factors extracted in a single operation 
are oblique to one another. Therefore another basic concept is that of the matrix of 
correlations among the factors. Also, since the factors are correlated, the immediate 
results of a group-factor must lead to two matrices—a factor pattern and a factor 
structure—the first of these gives the coefficients of the factors hi the linear descrip¬ 
tions of the variables, while the second gives the correlations of the variables with the 
factors. 

These results of the group-factor analysis can be used to obtain a matrix of repro¬ 
duced correlations; and hence the residual matrix can be determined. If the residual 
matrix is not sufficiently close to the null matrix then the multiple-group method 
can be applied, again, to the residual matrix. 

After it is determined that the multiple-group solution adequately reproduces the 
observed correlations, some investigators may consider such a solution as a pre¬ 
liminary step to the “rotational” problem. In seeking “simple structure” (see 6 . 2 ) 
by transformation of axes, the problem is simplified if an orthogonal frame of reference 
is first obtained. Hence, two additional concepts are introduced—an orthogonal factor 
matrix and the transformation matrix from the oblique to this orthogonal solution. 

The foregoing concepts, which arise in the multiple-group method of factoring, 
are listed in Table 11.1. Also summarized in this table are the symbols associated 
with these concepts. 

One of the principal differences that appears in the several developments of 
multiple-group methods of factor analysis really is concerned with the formulation 
of scientific hypotheses rather than with the method of analysis. Guttman and 
Holzinger suggest that the multiple-group methods be used in conjunction with some 
a priori psychological theory. Guttman emphasizes the fact that the computational 
procedures of multiple-group methods can be applied in any event, but most psycho¬ 
logical meaning can be gained only through the testing by the data of preconceived 
hypotheses. These hypotheses are reflected in the specific manner of grouping the 
variables and in the resulting common factors (usually oblique). 

On the other hand, Thurstone emphasizes that the multiple-group method of 
factoring is quite independent of the manner of grouping the variables, and in an 
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Notation 


R 

: (p = 1,2,- 

(j = 1, 2, 

C (P^ = 1 ’ 2 ’' 

S = (s jp ) 

O = (f*T pT q) 

P = (bjp) 

R t 

R m 

T -i 

A — ( a jp) 


Table 11.1 

Concepts and Notation 

Concept 

Original correlations (reduced correlation matrix) 

^rms^of correlatkms of variables with Groups 
Sums of correlations among Groups 
Oblique factor structure 
Correlations among factors 
Oblique factor pattern 

Reproduced correlations 

Tra'ns'formation orthogonal factor pattern, 

Orthogonal factor matrix 

ample, deliberately sets up groups in an Illls attention to the 

reformations. This thought ts ev.den whenTO^ ^ ^ ^ ^ corre l a „ons 
unnecessary” restrictions that f^ Z of g fact o r analysis.” These restrictions are un 
i order to use his “simple metho reduct ion of the correlation matrix to 

ecessary when the object is simply to g method; they are not unnecess “ y 

rotor matrix by the expedient m “ lt ‘ p g bv use of a multiple-group solution. 

vhen the object is to test some ^multiple-group method suitable only if the 

Thus while Holzinger considers the “ u p g orti ons of approximate unit 
•orrelation matrix is amenable to f uo " ng ”‘ a d oavoid or reduce the problem 
rank and Guttman prefers to group t e var ltip i e _ gro up method primanly as 

of rotation of axes, Thurstone conceives of " “ P g vide an orthogonal factor 
another (efficient) technique for “ at ional problem” [477. p. 171]. 

matrix "which is the starting po s of the m ultiple-group methods in p 

There is no doubt about the e of extracting one factor at a time, an 

lemsinvolvinghand computationseadof“ ^ the m U u,ple-grcmp 

outing a residual matrix after each, the basic tn ^ computed after extracting 

method implies that only of linear ly independent groups « 

several factors at one time. If a num then only one reS idual matr 

eoual to the dimension of the common number of clusters is too small, 

Will be necessary; otherwise if the to many clus ters should be selected then the 
^ evident jn the matrix of correlations among 

group factors (and the inverse will n0 eX1 ? , extra cted in a single operation, then 

If the total number of common factors again to the residual matrix obtained 
the multiple-group “*f,^cr^i«^ated as many times as 

after the first operation, and values In the successive applicatio 

bring the residuals down. to ^8*1 extracte d at each sta ge are oblique o 

multiole-group method, the common 235 
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in“ d an n ob to aI1 f ^ors 
s an oblique structure then all th* ™ l that f an a P rior i hypothesis 

° m “* - * Srrrr ~ * •»«” » 

11-3. The Oblique Solution 

Swmwm 

napter 13. However, in that chanter th* uv W1U1 an obll( l ue solution appears in 
Ahe essence of the mnltinu 

reference axes passing throughtL^cemroMs'° f re P resent “S ‘he factors by 
Stnce the clusters of variables wouid not ordt a , k reSPeCtiVe TO of variables 
the common factors obtained in a “ I Z ?* be at right an « le * to one another 
oblique also. The multiple™ ZfhoTT °" ^ multi P>c-group method are 
he starting point, with communalities^n the ^ 3 , reduced correlation matrix as 
the methods suggested in chapter 5 The a c f 7T‘ diag ° naI Unrated by one o 

. hod of ^-coefficients of 7 4 or nr/*a / ’ ’ on sonie # priori basis 

Smce sums of variables (centroids* a • p , ureIy arb hrary basis. 

In ih? 7” ulas for their variances and7ormla7 T P ° Sltes are not necessarily 
n the multiple-group method, a com D osite J " s have lon 8 been known r 4391 
of n p variables in a group G ,, name7y P0S “ e T ’ “ assumed through thTclusto 

( 11 . 1 ) 


( Z k > k E G p ) 


Such composite variables constitute the hr (P . 2, , m ). 

not have unit variances. ° bllque fact °rs, which ordinarily would 

The calculation of variances ansi 

by the determination of certain prelimi^™" 1 * 8 am ° ng the factors can be expedited 


( 11 . 2 ) 


w jp — Z (r Jk ; he G p ) 


(j 


p = i 9 2,. 


be°the c P ' WhCn thC ° f the variables in 

r om “ ,y ’- - *— zxts^jzss; 
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method is 

^ w pq — Z ( w jq» j 6 G p ) (p,q = 1,2, • • •, m) 

which represents the sum of correlations between all variables in group G p with all 
those in group G q (including the case where p = q). Of course, equation (11.3) can 
be expressed in terms of the original correlations, as follows: 

(11 ' 4 ) = !(*>; jeG p ,keG q ) (p, q = 1, 2, • • • , m). 

The first task is to determine the correlations among the oblique factors and the 
correlations (structure values) of the variables with these factors. The required corre¬ 
lations can be expressed concisely by means of the foregoing sums of simple correla¬ 
tions, but first a formula will be obtained for the variances of the oblique factors. 
Ordinarily the variance of a composite variable T p would be given by: 

^ 1 - 5 ) s t p = n p + 2 £ (r jk ; j,keG p ,j < k), 

since there are n p self-correlations of unity. However, when the “self-correlations” 
are replaced by communalities, this formula may be written 

( n - 6 ) 4 P = !(>>; LkeG p ), 

where it is understood that r^ = hj. Employing (11.4), the last expression becomes 
simply 

(n- 7 ) 4 P =w pp . 

Now, by definition the correlation between any two composite variables T and 
T q , is ’ P 

( 1L8 ) r T P T<i ~ Z T pi T qi /Ns Tp s Tq , 

where summation with respect to i from 1 to IV is understood. The s’s in the 
denominator have just been expressed in (11.7) in terms of the foregoing sums of 
simple correlations. The remainder of (11.8) may be expanded and simplified as 
follows: 



(H. 9 ) =Z (z^fW^; jeG p ,keG q 

= Z ( r jk ; i E G p , k e G q ) 

= W 

pq- 

Substituting the values from (11.7) and (11.9) into (11.8), the latter formula for the 
correlation between two oblique factors becomes: 

(1U0 > T T p T, = 
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Next, the correlations of the variables with the factors—the oblique factor structure 
—can be obtained in terms of the foregoing sums of simple correlations. The structure 
value Sj p is the correlation r ZjTp of a variable in standard measure with a composite 
variable whose variance is given by (11.7). Then, it can readily be shown that the 
required correlation is given by the formula: 

( 11 . 11 ) S Jp = W ip /CKp- 

To complete the solution in terms of correlated factors, the linear descriptions of 
the variables in terms of the factors are required as well as their correlations with the 
factors. The coefficients in these linear equations, i.e., the pattern values, are the 
coordinates with respect to the oblique (factor) axes of the points representing the 
variables. The factor pattern can be obtained from the known factor structure S 
and the correlations among the factors <J>, according to (2.44), as follows: 

(11.12) P = S0 1 

The bulk of work implied in this formula is the determination of the inverse of O. 
The square root method of 3.5 can be used to obtain the inverse and to systematically 
carry out the matrix multiplication to produce the pattern matrix. Actually, an 
alternative computing procedure is developed in 11.5 and illustrated in 11.6. 

If the analysis is to be terminated with an oblique solution then appropriate 
computing formulas are required for the reproduced correlations and the residuals. 
There are several formulas for getting the reproduced correlations directly from the 
component parts of an oblique solution. The basic formula (2.46), although not the 
most efficient one, involves the multiplication of three matrices, as follows: 

(11.13) R* = POP'. 

From the relationships between a pattern and a structure in an oblique solution 
(see 2.8), this formula can be put in either of the following more convenient forms 
for computation: 

(11.14) = PS' or R f = SP'. 

In either of these forms the computation is precisely the same as in the case of an 
orthogonal solution, i.e., formula (2.50). 

The residual matrix with m factors removed is defined by 

(11.15) R m = R-R + , 

where it is assumed that the reproduced correlations are based on the extraction of 
m factors by the multiple-group method. 

Before considering the methods for transforming the results obtained in this 
section to an orthogonal solution, it is well to consider the cogent argument, presented 
by Guttman [175, p. 215] for accepting the oblique multiple-group solution: 

A drawback is that if rotation is actually necessary, this implies that original hypotheses 
about the nature of the common factors are wrong. A posteriori hypotheses, made after 
inspection of the data, may be subject to all the uncertainties and controversies which beset 
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“the" '"'taouDiM of Ih J he re tl US 1 ° f \ faCt ° r analySis WOU,d seem t0 “ore trustworthy 
v. 1 v, • ^ ^ varia bles] is chosen according to an a priori psychological theory 

The I* a ^ by thC data - TWS kind of P roce dure carries more scientific wekht thS 
the procedure of constructing an a posteriori theory to fit already known facts The latter is 
implied by rotation of axes (whether or not the rotation is “blind”). 

11.4. The Orthogonal Solution 

Whether an orthogonal solution is desired simply for completion of the multiple- 
group analysis (expediting some of the computational labor) or as a stepping-stone 
to the rotational problem, methods for obtaining it are on,lined in this sec“e 
orthogonal solution is derived by transformation from the oblique solution described 
in the preceding section. After the oblique solution has been obtained, the trans¬ 
formation required is either from the pattern values (b jp ) or the structure values (s- ) 
to the orthogonal factor weights (a Jp ). v JP 

Of all possible transformations tram oblique to rectangular coordinate axes, the 
pecial case employed in the multiple-group analysis has the following properties- 

sL ih T ° f system cooties with the first oblique-factor axis, the second 

is in the plane of the first two oblique axes and orthogonal to the first, etc.* This type 
of transformation, for the case of only two factors, is illustrated in Figure 11.1. For 
the sake of simplicity, a general point is designated P instead of Pj, and its coordinates 



• iJ n ^ orma ^ mat hematics, this transformation is known as 
yields the orthogonalization of a matrix [385, p. 78]. 


Gram-Schmidt process and 
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are shown as (f>,, b 2 ) instead of (b n , b J2 ) in the oblique reference system and as (a, , a 2 ) 
instead of (a,„ a, 2 ) in the rectangular reference system. The point P (or the vector 
from the origin to P) can be represented in the common-factor space as follows 

(see 4.10): 

(11.16) z " = + b 2 T 2 » 


(11-17) 


Z" = d x F x + # 2 ^ 2 ? 


depending upon which reference system is employed. In the oblique reference system, 
the (orthogonal) projections on the coordinate axes are distinct from the coordinates, 
and are given by the structure values: 


(11.18) 


and s 7 = r. 


For the special type of transformation from oblique to rectangular axes with the 
properties listed above, the relationships between the two sets of coordinates are 
given by [201, p. 19]: 


(11.19) 


a x — b x + b 2 cos0 12 
a 2 = b 2 sin 0 12 


where 0 12 is the angle between the oblique axes T x and T 2 . As noted in 4.9, the cosine 
of the angle between two vectors is the correlation of the variables represented by 
such vectors. Hence, cos0 12 = r, where for simplicity r is used to designate r Tl7V 
Then the foregoing transformation equations may be written: 


( 11 . 20 ) 


a x = b x + b 2 r 


Although the example covers only two variables there is a heuristic value to ex¬ 
pressing the transformation in matrix form: 


( 11 . 21 ) 


{ciji ^2) (bji bj 2 ) 


The transformation matrix that carries the coordinates of the oblique reference 
system into the coordinates of the rectangular reference system can be derived from 
the matrix of correlations among the oblique factors. By applying the square root 
operation (see 3.4) on the matrix O of factor correlation, the resulting “square root 
matrix T' is found to be precisely the transformation matrix, as in (11.21). The general 
square root operation, 


( 11 . 22 ) 
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reduces to the following explicit form for the case of two variables: 


-1 

"i 

_1 


i 

o 


1— 

T~H 

1_ 

r 1 


1 

T~H 

> 

_1 


Lo 


The expression (11.21) is immediately generalizable to n variables and m factors, 
viz., 

(11.24) A = PT', 

where A and P are n by m matrices ( a jp ) and ( b jp ), respectively; and T, a square matrix 
of order m, is derived by the square root operation on <l>. While (11.24) provides the 
transformation from oblique coordinates to the specified orthogonal coordinates, 
it is desirable to express the transformation in terms of the oblique structure values, 
because those are determined first in the course of the analysis. To accomplish this’ 
substitute the expression for the oblique factor pattern matrix from (11.12) into 

(11.24) , obtaining 

(11.25) A = S<D 1 T'. 

Then from the square root operation (11.22) on <l>, the foregoing equation becomes 

A = ST~ ^TT X T = ST~ 1 I, 


or 

(11.26) A = ST -1 , 

which is precisely the desired transformation from the oblique structure values to the 
orthogonal factor pattern. Stated another way, the transformation (11.26) to the 
orthogonal frame of reference (F’s) is described in terms of projections (sj on the 
oblique axes instead of coordinates ( b ip ) in the original oblique reference system. 

11.5. Multiple-Group Factor Algorithm 

The methods described in the last two sections can be summarized and organized 
in a convenient computing form. Such a schematic form is presented in Table 11.2. 
The principal parts of this worksheet consist of: (1) the stub, listing the variables; 
(2) the heading, giving brief instructions; and (3) the body of the table proper, made 
up of three vertical sections with three blocks in each section. Also, at the bottom 
of the table, there are instructions for checking the numerical operations. 

What this computing form accomplishes is the determination of both the oblique 
factor pattern and the orthogonal factor solution, starting with the intercorrelations 
among the factors and the oblique factor structure. The work is begun by recording 
in the first vertical section the three known matrices: (1) the m x m matrix qf correla¬ 
tions among the factors; (2) an identity matrix of the same order; and (3) the n x m 
oblique factor structure. 


241 




11.5 DIRECT SOLUTIONS 


The next step involves the application of the square root method (see 3.4) to the 
first vertical section to get the middle section. The first block, of course, yields the 
square root matrix of the original matrix of factor correlations. The next block, 
obtained by applying the square root operator to the identity matrix, contains the 
transformation matrix T -1 . In the third block, the orthogonal factor solution is 
determined by the same square root operation. The fact that the square root operation 
on the matrix S yields the matrix A can be seen from equation (11.26). 


Table 11.2 


Multiple-Group Factor Algorithm 




Instructions 




Square Root Operation 

Row-by-row Multiplication of 
Preceding Block with T” \ i.e., 

Variable 

Original Matrices 

(T” J ) on Preceding Block 

Postmultiplication by (T')” 1 


(m columns) 

(m columns) 

(m columns) 

T 

t 2 

Factor Correlations 

Square Root Matrix 

(Result is I, except for rounding 
errors—may be left blank, or 
used as check) 


o 

T' 

T m 

(O = T'T) 

e 

H 

1 

II 


T 

T 2 

Identity Matrix 

Transformation Matrix 

Inverse of Initial Matrix 

O 1 

T m 

I 

T i 

(IT” 1 = T” 1 ) 

( T -1(t')-i = <&-i) 

(not required explicitly) 

1 

2 

3 

Oblique Factor 
Structure 

Orthogonal Factor Matrix 

Oblique Factor Pattern 


S 

A 

P 

n 


(ST” 1 = A) 

(A(T')” 1 = P) 

Total 


Sum of all (2m + n) elements in each column 

Check 

(No check in this 

Square root operation 

Row-by-row multiplication of 


block) 

applied to totals of 
preceding block 

totals in preceding blocks with 

T -i 


It should be noted that the arrangement of work in Table 11.2 is transposed from 
that in Table 3.2. In Section 3.4, the square root method is presented with the square 
root matrix below the original matrix, while here it is placed to the right. Because 
of this transposition of the blocks, the computing formulas of 3.4 must also be 
transposed. For example, the principal formulas for the values in the first three 
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columns of the middle block become: 


(11.27) 

s - f JT 

Sjl - , 

*11 

U > 1) 

(11.28) 

O _ r j2 ~ S jl S 21 

S j2-1 

*22-1 

ii > 2) 

(11.29) 

o _ r j3 ~ S j l s 31 ~ S i2-l S 32-l 

s j3 . (2) -- - , 

*33 • (2) 

U > 3) 


in place of formulas in Steps 3, 5, and 7 for the successive rows in the arrangement 

of 3.4. 


Finally, the third vertical section is obtained by simple matrix multiplication. 
Each block (matrix) in the middle section is multiplied, row-by-row, by the trans¬ 
formation matrix T 1 (which is set off by the bold lines) to obtain the corresponding 
block in the third section. The first block need not be computed, except if desired 
for an additional check. The next block, which gives O -1 , again is not required 
explicitly, but results without cost as a by-product of the square root method. The 
required oblique factor pattern is obtained in the last block of the algorithm by 
postmultiplication of the orthogonal factor matrix by the transpose of the trans¬ 
formation matrix,* i.e., 

(11.30) P = A (T)-i. 

The fact that the row-by-row multiplication of A by T _1 actually yields P, can be 
derived from (11.24) by postmultiplying both sides of that equation by (T') - *. 

After the computations outlined in Table 11.2 have been made, it is a simple matter 
to get the matrix of reproduced correlations, and hence the residual matrix, Row-by¬ 
row multiplication of A by itself yields the matrix of reproduced correlations 
according to (2.50). An alternative procedure is row-by-row multiplication of S by P 
according to (11.14). 

The computing procedure involving the oblique solution of 11.3, followed by the 
square root method outlined in Table 11.2, will be found very efficient for factoring if 
an electronic computer is not available. It might still be desirable to get a multiple- 
group solution with the aid of a computer in some instances (e.g., to explore a specific 
hypothesis involving groups of variables). A computer program for triangular 
decomposition [353] may be very useful for such cases. In general, however, when 
an electronic computer is available, the principal-factor, minres, or maximum- 
likelihood methods would be preferred. 

11.6. Numerical Illustration 

The calculations involved in the multiple-group solution will be illustrated with a 
nine-variable example. The reduced correlation matrix, with the communalities 

* As indicated m 3.2, paragraph 18, the row-by-row multiplication of a matrix A by another 
matrix B is equivalent to the conventional row-by-column multiplication of A by B\ 
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Table 11.3 

Intercorrelations of Nine Psychological Variables 3 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1. Word meaning 

.81 









2. Sentence completion 

.75 

.69 








3. Odd words 

.78 

.72 

.75 







4. Mixed arithmetic 

.44 

.52 

.47 

.91 






5. Remainders 

.45 

.53 

.48 

.82 

.74 





6. Missing numbers 

.51 

.58 

.54 

.82 

.74 

.74 




7. Gloves 

.21 

.23 

.28 

.33 

.37 

.35 

.35 



8. Boots 

.30 

.32 

.37 

.33 

.36 

.38 

.45 

.58 


9. Hatchets 

.31 

.30 

.37 

.31 

.36 

.38 

.52 

.67 

.77 


a Taken from K. J. Holzinger’s unpublished class notes involving a study of 696 cases, 12 tests, and 4 factors. 


estimated by single triads of equation (5.33), is contained in Table 11.3. As the first 
step in the analysis, the nine variables are placed in three groups, as follows: 

G x : (1,2, 3), verbal; 

G 2 : (4, 5,6), arithmetic; 

G 3 : (7, 8,9), spatial relations. 

This grouping is justified by the nature of the respective variables and verified by 
^-coefficients: 

B(l, 2,3) = 187, B{ 4, 5,6) = 186, and 5(7, 8,9) = 168. 

The sums (11.2) of the correlations of each variable with all the variables in each 
group are recorded in Table 11.4. Then the sums (11.3) of these sums by groups are 
given in Table 11.5. From these two tables the correlations among the factors, and 
between the factors and the variables, can be computed readily. By use of formula 
(11.10), the following matrix of factor correlations is obtained: 


1.0000 

.6511 

.4640 

.6511 

1.0000 

.5324 

.4640 

.5324 

1.0000 


Similarly, by application of formula (11.11), the correlations of the variables with 
the factors are obtained and presented in Table 11.6. 

The remainder of the multiple-group analysis is performed by means of the 
algorithm described in 11.5. This worksheet for the nine-variable example appears 
in Table 11.7. The matrix O of correlations among the factors and the oblique factor 
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structure S of Table 11.6 provide the starting point for the square root operation 
and subsequent matrix multiplications performed in Table 11.7. The end products 
are the orthogonal factor matrix A, in the lower block of the middle section, and the 
oblique factor pattern P, in the lower right hand corner. After finishing this analysis, 
the adequacy of the solution is judged from a comparison of the matrix of reproduced 
correlations with the reduced correlation matrix. The reproduced correlations may 
be computed by row-by-row multiplication of A by itself, and checked by row-by-row 
multiplication of S by P. A comparison of the original and reproduced correlations 
show only one instance of a residual as high as .02. Obviously, the multiple-group 
solution provides an excellent fit to the observed data of Table 11.3. The adequacy 
of the results, of course, is independent of whether the orthogonal or the oblique 
solution is preferred. 


Table 11.7 

Computation of Orthogonal Factor Matrix A and Oblique Factor Pattern P 


(Following multiple-group factor algorithm of Table 11.2) 


Variable 

Original Matrices 

Square Root Operation T 1 

Row-by 

row Multiplication 
with T~ 1 

T, 

1.0000 

a 

a 

1.0000 



1.0000 

0 

0 

7; 

.6511 

1.0000 

a 

.6511 

.7590 


.0000 

1.0000 

0 

T 3 

.4640 

.5324 

1.0000 

.4640 

.3034 

.8323 

-.0000 

-.0000 

1.0000 

T, 

1.0000 

0 

0 

1.0000 

-.8578 

- .2448 

1.7957 


a 

To 

0 

1.0000 

0 

0 

1.3175 

-.4803 

-1.0126 

1.9665 

a 

t 3 

0 

0 

1.0000 

0 

0 

1.2015 

-.2941 

-.5771 

1.4436 


Obliaue factor structures: S 

Orthogonal factor matrix : A 

Oblique factor pattern: P 

1 

.90 

.52 

.37 

.90 

-.09 

-.03 

.98 

-.09 

— .U4 

2 

.83 

.61 

.38 

.83 

.09 

-.04 

.76 

.14 

— .05 

3 

.87 

.56 

.46 

.87 

-.01 

.07 

.86 

-.04 

.08 

4 

.55 

.96 

.43 

.55 

.79 

-.07 

-.11 

1.07 

-.08 

5 

.56 

.86 

.49 

.56 

.65 

.04 

-.01 

.84 

.04 

6 

.63 

.86 

.50 

.63 

.60 

.03 

.11 

.77 

.04 

7 

.28 

.39 

.59 

.28 

.27 

.45 

-.06 

.14 

.55 

8 

.38 

.40 

.76 

.38 

.20 

.63 

.05 

-.04 

.76 

9 

.38 

.39 

.88 

.38 

.19 

.77 

.03 

-.12 

.93 


8.50 

8.73 

7.86 

8.50 

4.21 

3.16 

4.10 

4.05 

3.80 

Check 




8.50 

4.21 

3.17 

4.12 

4.03 

3.80 


a Terms above the diagonal of a symmetric matrix are deleted for simplicity. These terms, however, must be 
included in the totals in order for the checks to apply. 
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12 

Different Solutions in Common-Factor 
Space 


12.1. Introduction 

In the five preceding chapters, various methods were proposed for reducing a 
correlation matrix to a factor matrix which in some reasonable sense reproduces the 
original correlations. The factor solutions could be viewed as final products, in 
their own right; or they could be conceived as initial products satisfying the fundamen¬ 
tal factor theorem of reproducing the correlation matrix, and requiring further 
manipulation to a final form. The basic philosophy for transforming a preliminary 
( so ution into a final multiple-factor type of solution is covered in 6.1 and 6.5. There 
it is made clear that once the common-factor space has been determined, an infinitude 
ol rotations is possible from one coordinate system to another without any effect 
on the adequacy of solution. 

The multiple-factor solution has been defined essentially in intuitive terms only 
and so has not lent itself to precise mathematical formulation. This preferred type of 
solution rests upon the principles of “simple structure” set forth in 6.2. Attempts to 
objectify these principles are made in chapters 14 and 15, thereby making possible 
the determination of multiple-factor solutions by analytical methods. Since these 
methods involve very extensive computations, their very conception had to await 
the development of the large electronic computers. Before considering such methods, 
there is a pedagogic value in reviewing the more modest techniques for obtaining a 
multiple-factor solution involving subjective, graphical transformations from some 
arbitrary initial solution. Such techniques for getting an orthogonal multiple-factor 
solution are the principal topics of this chapter, while manual methods for getting 
an oblique solution are covered in the next chapter. 

Before considering the multiple-factor solution proper, the general theory of 
relationships among solutions in a given common-factor space is developed in 12.2. 
Then graphical methods for rotating an initial solution to a multiple-factor solution 
are presented in 12.3, and numerical illustrations given in 12.4. Finally, certain 
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problems are considered in 12.5 which involve comparisons of factor solutions 
when either the variables or the samples are altered. 

12.2. Relationship between Two Known Solutions 

Several types of comparisons among factor solutions are of interest, both from a 
methodological viewpoint and from the viewpoint of establishing basic factors in a 
content field, such as psychology. The most obvious, of course, is a comparison of any 
two solutions derived from the same correlation matrix [240]. That is the subject of 
the present section. Other comparisons are discussed in 12.5. 

In mathematics the reference system plays a very minor role, the particular con¬ 
figuration of points is of prime importance, and the coordinate system is of much 
lesser significance. Thus, if it is desired to describe an ellipse, i.e., get an algebraic 
equation for the ellipse, it is quite irrelevant whether rectangular Cartesian co¬ 
ordinates, nonrectangular Cartesian coordinates, or polar coordinates are employed. 
Furthermore, the particular orientation of axes is immaterial. With each change of 
the coordinate system, of course, the equation of the ellipse will generally change, 
but the fact remains that the equation in each case describes the ellipse with respect 
to the given reference system. 

The object of factor analysis, on the other hand, is the selection of an appropriate 
frame of reference, the configuration of points representing the variables being of 
lesser significance. Then the indeterminateness of the factor problem is obvious. In 
selecting a particular reference system, the unit vectors along the coordinate axes 
represent the factors, and, since the reference system can be rotated about its origin 
in an infinitude of ways in the common-factor space, there arises an infinite number 

of factor systems for a given body of data. 

As a general procedure, it might be advisable to put any factor solution m some 
standard form, e.g., the canonical form of 8.8. Then, if two independent solutions are 
each brought to canonical form, it will be obvious whether they are indeed alike or 
not (see 10.6). When it is desired to study the relationships between two distinct factor 
solutions, that may be accomplished by finding a matrix of transformation which 
carries the coordinates of one into the other. Thus, if the first factor pattern is denoted 
by A and the second factor pattern by B, then the problem is to find a matrix T 
such that 

(12.1) AT = B 

The matrix A represents the coordinates a jp of the n points with respect to one set 
of m common-factor axes, say F u F 2 , • • •, F m ; the matrix B represents the coordinates 
b jp of the points with respect to a new set of axes, say K x , K 2 , • • •, K m ; while the 
matrix T represents the transformation of the coordinates in A to those in B. 

If the number of factors is equal to the number of variables then the solution for 

T is simply 


( 12 . 2 ) 
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Since the number of common factors is usually much smaller than the number of 
variables however, the matrix A does not have an inverse and T cannot be calculated 
directly. It may be noted that, for any matrix A, 

( 12 - 3 ) (A'A) -1 (A'A) = I, 

and hence, if both members of (12.1) are premultiplied by (A'A) -1 A', there results 

( 12 - 4 ) T = (A'A^A'B 

This formula gives the desired matrix of transformation. 

The relationships among the factors themselves (or the factor measurements) may 
also be obtained by means of the matrix T. In the given common-factor space the 
column vectors of the first and second sets of factors may be taken to be f = {F l F 2 • • •} 

and k = {K X K 2 • • •}, respectively. From the definitions of the two factor pattens 
and their assumed equality, it follows that 

( 12 - 5 ) Af = Bk. 

Premultiplying both sides of this equation by (A'A)" 1 A', and again employing (12.3), 
there results 

( 12 - 6 ) f = (A'A) -1 A'Bk. 

By making use of (12.4), this expression finally simplifies to 
( 12 - 7 ) f = Tk. 

This is the matrix formulation of the relationships between the F factors and the 
K factors. 

A system of equations represented by (12.1) gives the actual transformation of 
coordinates between the two factor solutions. Thus the factor weights for any variable 
in one solution are expressed linearly in terms of the weights of the other solution. 

n alternative way of expressing the relationship between two factor solutions is 
afforded by a system of equations implied by (12.7). Thus, the contributions of the K 
actors to the variance of each F factor can be obtained. Also, equations (12 7) may 
be used to estimate the measurements of F factors from known equations of measure¬ 
ments of the K factors. 

A detailed numerical illustration of these procedures will now be given, employing 
the example of thirteen psychological tests. The relationships will be developed be¬ 
tween the multiple-factor solution of Table 12.5 and an oblique primary-factor 
^iuhon for the same data (see ex. 8, chap. 13). A schematic worksheet is presented 
m Table 12.1 along with the specific example. The matrix of coefficients of the 
multiple-factor pattern is denoted by B and appears in the extreme right-hand block 
of the table, and just to the left of it is the oblique factor pattern denoted by A These 
two solutions may be considered equivalent (or in the same common-factor’space) 
since both solutions were obtained by transformation of a centroid solution. 

After the known factor patterns are recorded in the right half of Table 12 1 the 
computations leading to the transformation matrix T are performed in the left half 
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Table 12.1 

Transformation Between Solutions in a Fixed Common-Factor Space 


A. Computing Algorithm 


Calculation of Transformation Matrix 

j 

A'A 

Col.-by-col. multi¬ 
plication of A by 
itself 

I 

1 

2 

Square root method of Table 3.3 is applied 
to compute (A'A) -1 



(A'A) -1 


T' = (B'A) (A'A) -1 
Row-by-row multi¬ 
plication of (B'A) 
(A'A) -1 

B'A 

Col.-by-col. multi¬ 
plication ofBby 
A 

n 


Known Factor Patterns 


a } 2 ' • ‘ frji 


A 


b J2 


B 


B. Example of Thirteen Psychological Tests 


Calculation of Transformation Matrix 


1.823 -.232 - .058 
a 3.394 -.017 

a a 2.103 

1 0 0 

1 0 

1 

1.350 -.172 -.043 
1.834 -.013 
1.449 

.741 0 0 

.069 .545 0 

.023 .005 .690 


.554 .038 .016 
a .297 .003 

a a .476 

.250 .883 .070 

.198 .314 .962 

.950 .344 .262 

.246 2.939 .121 

.230 1.008 2.006 
1.635 .944 .490 


10 

11 

12 

13 


Primary Factor Pattern 0 


U J i 


.731 

.441 

.721 

.508 


-.058 

.037 

-.068 

.155 

-.068 


-.385 

-.039 

.073 

.351 




■03 


.089 

.004 

.090 

.090 


.801 

.809 

.901 

.591 

.919 


.164 

.077 

-.177 

-.061 


.142 

.004 

.142 

.003 


.087 

-.051 

-.030 

.078 

-.081 


Multiple-Factor Pattern 0 


.809 

.659 

.773 

.594 


b n 


.096 

.102 

.136 

.193 


.702 

.719 

.778 

.562 

.791 


.119 

.109 

-.083 

.070 


J j2 


.248 

.089 

-.011 

.121 


.324 

.214 

.243 

.291 

.198 


.757 

.651 

.703 

.619 


-j3 


.706 

.425 

.602 

.516 


.243 

.300 

.237 

.372 

.231 


-.100 

.161 

.210 

.469 


8 Terms below the diagonal of a symmetric matrix are deleted for simplicity, while blanks actually denote 


‘ Obtained by the graphical methods of chapter 13 in exercises 2-8 for that chapter. 
c From Table 12.5. 
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of the table. First, the product A'A is determined and placed in the upper left block 
of the table; and the inverse of this product is calculated by the square root method 
(see 3.5). This provides the first part of the expression (12.4) for T. In place of the 
second part of (12.4), its transpose is determined (primarily for computational 
convenience). This product (B'A) is recorded in the block below (A'A) -1 . Finally, 
the transpose of the desired transformation matrix (12.4) is computed in the extreme 
lower left corner of the table. 

The actual transformation from the primary factor coordinates to the multiple- 
factor coordinates is given by 

b jX = .250a 7l + .883 a j2 + .010a j3 , 

(12.8) bj 2 = .198 Uj X -t~ ,314a j2 -t~ .962ctj 2 , 

bj 2 — .950 cij X -f- .344a 7 - 2 ~t~ .262uj 3 , 

where the a’s and h’s are the coordinates in the matrices A and B, respectively. It is 
evident from these equations that the coefficients of the first multiple factor can be 
described mostly in terms of the coefficients of the second primary factor; the second 
multiple factor in terms of the third primary factor; and the third multiple factor in 
terms of the first primary factor. 

Similarly, by means of the matrix T, the relationships among the factors may be 
exhibited as follows: 


T x = .250 M x + ,198M 2 + .950M 3 , 

(12.9) T 2 = M3M X + .314M 2 + .344M 3 , 

T 3 = .070Mi + .962M 2 + .262M 3 , 

where the T’s represent the (oblique) primary factors and the M’s the (orthogonal) 
multiple factors. From these equations it is apparent that each of the oblique factors 
consists primarily of one of the multiple factors, with some slight contribution of each 
of the other two. For example, the factor M 3 contributes 90 per cent to the unit var¬ 
iance of Tj, while M x and M 2 contribute only 6 per cent and 4 per cent, respectively. 

While the foregoing development may seem to be somewhat trivial, there was a 
period of time in the development of factor analysis when such relationships were 
not completely understood. As noted in chapter 1, it may have been the failure to 
recognize the fact that a given matrix of correlations could be factored in an infinite 
number of different ways that led to the many controversies about the “true”, the 
“best”, or the “invariant” solution for a set of data. It is now obvious that any two 
factorizations of a given set of data are related by (12.1), where the transformation 
from one to the other solution is given by (12.4). 

In the problem just treated, both solutions A and B were assumed to be known, 
and the relationship between them (given by the transformation matrix T) was to be 
determined. The general task of obtaining a derived solution, which is the subject 
of the entire part iii of this text, involves a somewhat different problem. The initial 
pattern A is given, and some rules about the nature of the final solution B are specified, 
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and from this a matrix of transformation T is built up which in fact leads to B. If the 
model for B is specified in exact mathematical terms, there is no problem other than 
the computing effort in arriving at the final solution and the transformation matrix. 
On the other hand, if the nature of B is only vaguely identified, then there are many 
subjective solutions. This distinction will become clearer in the course of the develop¬ 
ment in the remainder of this chapter and in the following three chapters. 

12.3. Graphical Procedures for Orthogonal Multiple-Factor Solution 

Subjective procedures are proposed in this section for developing the transforma¬ 
tion from some initial solution to a multiple-factor form of solution. The method 
consists, essentially, of the build-up of a series of rotations in a plane. The angle of 
rotation at each stage is selected by inspection of a graph and judgment as to the 
attainment of the objectives of simple structure (see 6.2). 

To assist in the development of orthogonal transformations, the following summary 
of notation will be found convenient: 

A = ( a jp ), initial factor matrix, with factors F p , 

(12.10) B = ( b jp ), final factor matrix, with factors M p , 

T = (X qp ), orthogonal transformation matrix, with 6 qp the 
angle of rotation in the plane of the original factor p 
and the final factor q. 

The range of j is over the n variables, and the range of p and q is over the m factors. 

When an initial factor pattern involves only two factors, the variables may be 
represented as points in a plane, with the coefficients of the factors as the coordinates. 
Then a transformation to some other form of solution implies the representation of 
these points with respect to the axes denoting the new factors. Such a transformation 
is merely a rotation of axes in this common-factor space. Thus, any point may be 
referred to either system of reference F x , F 2 , or M x , M 2 , as shown in Figure 12.1. For 
the sake of simplicity, a general point is designated P instead of Pj, and its coordinates 
in the original frame of reference are shown as (a x , a 2 ) instead of (a jX , a j2 ), and in the 
new reference system as (b x , b 2 ) in place of (b jX , b j2 ). The problem is to express the 
new h-coordinates in terms of the original a-coordinates when the reference axes 
are rotated through an angle 9 from the original to the new. 

The required transformation is accomplished by making use of the following 
property on projections of lines: the sum of the projections upon a straight line of the 
segments of any broken line connecting two points is equal to the corresponding sum 
for any other broken line connecting the same two points. Here the origin O and any 
point P are joined by two broken lines, namely, ORP and OSP. It follows that the 
projections of these broken lines along any direction are equal, i.e., 


( 12 . 11 ) 
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Proj. OR + Proj. RP = Proj. OS + Proj. SP. 
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If the direction is taken, first, as the positive axis of M x and, second, as the positive 
axis of M 2 , the resulting expressions are 


( 12 12 ^ OR cos 9 + RP sin 9 = OS + 0, 

- OR sin 6 + RP cos 6 = 0 + SP. 

But the line segments are simply the coordinates, viz., 

OR = a 1? PP = a 2 and OS = h l9 SP = fi 2 . 

Hence, the final coordinates are expressed in terms of the initial ones, resulting from 
the rotation of axes through the angle 6, by the following equations: 


(12.13) 


b x = a x cos 6 + a 2 sin 6, 
b 2 = -a x sin 6 + a 2 cos 6. 


It should be noted that the trigonometric terms are actually the direction cosines of 
the new axes with respect to the old ones, and the equations (12.13) may be put in 
the form: r 


(12.14) 


b\ — 2 11 a 1 + k 2x a 2 , 
b 2 = k x2 a x + X 22 a 2 . 


For a set of n points there would be n pairs of equations (12.14) carrying each pair 
of coordinates (a jX , a j2 ) into the corresponding pair (b jX , b j2 ). This transformation 
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may be put in matrix form, as follows: 
(12.15) (b n b J2 ) = (a n 


a j2 ) • 


An 

2 2 1 


^12 

^22 


or, in more compact and generalizable terms: 

(12.16) B = AT. 

This expression is precisely of the same form as (12.1), but here the matrix B is to be 
derived from A upon the choice of the transformation matrix T. In the simple example 
of a rotation in a plane, the matrices A and B are each of order n x 2 and the matrix 
T is 2 x 2. The elements of T clearly satisfy the conditions 

2?j + III = 1, ^12 + ^22 = 2 U 2 12 + ^21^22 = 0 

so that the transformation is orthogonal in the sense of 4.6. 

When a factor solution involves three factors, the transformation from an initial 
to a final pattern takes the form: 

bji = 2 n a 7l + ^ 2 l a j2 + ^3l a j3> 

(12.17) b j2 = ^I2 a j\ + ^22 a }2 + ^32 a j3i 

bj 3 = A l3 Clji + 2 23 U j2 + ^33 a j3> 

where the preceding notation has been extended to an additional dimension. This 
transformation is covered by (12.16) if the factor matrices in that expression are 
presumed to be of order n x 3 and the transformation matrix of order 3x3. The 
elements of T, in this case, must satisfy the six independent conditions: 

2-11 + A. % i + 2 31 = 1, 

2-12 + ^22 + ^32 = 1? 

2-13 + 2-23 + 2-33 = 

(1118) ; ; + 2 2 + 2 2 -0 

2u2i2 + 4 21 /t 22 + A 3l A 32 — 

2 n 2 13 + 2 21 2 23 + 2 31 2 33 = 0, 

212^13 + ^ 22^23 + 2 32 2 3 3 = 0. 

Thus, the nine coefficients of (12.17), being subject to six conditions, afford only 
three degrees of freedom of rotation in ordinary space. Explicit equations for the 
h’s in terms of the a% involving only three independent parameters can be obtained,* 

but are not employed in practical analyses. 

A form of transformation that is not only practical but which can be readily 
generalized to any number of factors will now be indicated. The principle underlying 

* These are known as Euler’s formulas, a typical one being: a x = M cos <P cos i^ - sin cp sin i// 
cos 0 ) _ b 2 (cos (p sin tj/ + sin (p cos ^ cos 6) + b 3 sin cp sin 6, where the first subscript j has been 
dropped from the a’s and b' s and 6, (p, \\i are angles of rotation [430, p. 42]. 
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this method is that the result of successive orthogonal transformations is itself 
an orthogonal transformation, which is said to be the product of the successive 
rotations. Since a planar rotation is the simplest type, a transformation in three- 
space may be built up from such simpler rotations. Thus a transformation in ordinary 
space may involve the displacement of any two axes about the third, being in effect 
a rotation in a plane. Finally, a product of such rotations may be taken as the complete 
transformation. F 

The rotations can be arranged in a systematic order so that each axis is rotated 
with every other axis only once. The three rotations of pairs of axes in ordinary 
space may be indicated conveniently in the following manner: 


Old Axes 

PFl 

PF 3 

y 2 y 3 


Angle of Rotation 


New Axes 

YiY 2 

M 1 y 3 


It will be noted that the angle of rotation is denoted by 6 with subscripts corresponding 
to the numbers of the axes involved in the rotation. The first rotation is made in the 
p ane of Ji and F 2 , leaving F 3 unaltered. The new axes in this plane are designated 
by 7! and Y 2 . Since F 3 is perpendicular to the plane of F x and F 2 , it is perpendicular 
to any line in this plane. In particular, F 3 is perpendicular to the new axis 7. The 
next rotation is made in the plane of Y x and F 3 , leaving Y 2 unchanged. The new first 
axis, denoted by M u may be regarded as final because it is the result of rotations 
with each of the other axes. The last rotation transforms Y 2 and Y 3 into the final 
coordinate axes M 2 and M 3 . It will be observed that the 7’s are merely auxiliary 
axes and, taken alone, do not comprise a transformation from the F’s with the 
orthogonal properties (12.18). On the other hand, both sets of axes Y u Y 2 F 3 and 
. ’ 72 ’ 7 3 have these orthogonal properties, and either one may be taken in some 

instances, as the final reference system. The solution ordinarily desired, however 
is one based upon the complete transformation of the original axes F u F 2 , F % to the 
final M x ,M 2 ,M 3 . 3 

Denoting the matrix of transformation of F, and F 2 , leaving F 3 unchanged, by 


(12.19) 



cos 0 12 

— sin 0 l2 

(>" 

T 12 = 

sin 0 12 

cos 0 12 

0 


0 

0 

1 


the first of the above rotations may be denoted by 
( 12 - 20 ) C = AT,.. 


where C is an intermediate matrix of coordinates with respect to 7 7, F The 
second and third rotations may be designated similarly, as follows: ’ ’ 3 ‘ 

< 12 - 21 ) D = CT 13 and B = DT 23 , 
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where D is another intermediate matrix, and 


(12.22) T 13 


cos 0 13 

0 

— sin 0 13 


“l 

0 

0 

0 

1 

0 

, T 23 = 

0 

COS0 23 

— sin 6 23 

_sin 0 13 

0 

cos 0 13 _ 


_0 

sin 0 23 

cos 0 23 _ 


The three preceding rotations may be combined into a single transformation. 
Substituting the expression for D from the first into the second of equations (12.21) 

yields 


B = CT 13 T 23 , 


and substituting (12.20) for C in the last equation produces 

(12.23) B = AT 12 T 13 T 23 . 

Denoting the product of the three successive rotations by 

(12.24) T = T 12 T 13 T 23 , 

the expression (12.23) reduces to (12.16). In practice this matrix T cannot be deter¬ 
mined in advance, but rather, the final pattern is derived as a result of the successive 

rotations. „ 

The preceding methods can be generalized to a common-factor space ot m 
dimensions. Employing the notation (12.10) for this general case, the matrix of 
transformation is now 


2n 2 12 

^21 ^22 


^1 m 
^2 m 




m2 


A 


mm 


and in the transformation (12.16) the initial and final factor matrices are each of 
order n x m. The sets of A’s, by columns, are the direction cosines of the final reference 
axes Mi, M 2 , • • •, M m with respect to the original axes F t , F 2 , • • •, F m; These direction 
cosines are subject to the following set of independent conditions m order that the 
matrix T be orthogonal (see 4.6): 


m 

(12.25) Z 'U'U = (p>q = 1,2,'••,"*; p = & 

r = 1 

where 8 pq is the Kronecker delta (zero when p # q and unity when p = q). Since 
p ^ q and these indices range from 1 tom, the number of such conditions is m(m + l)/2 
There is a total of m 2 parameters in matrix T, and, since these are subject to m(m + l)/2 
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restrictions, there remain 


(12.26) 




- m(m + l)/2 = m(m - l)/2 = 


degrees of freedom of rotation in m-space. 

The number of independent parameters given in (12.26) may be associated with 
the same number of rotations in planes. The planes in which the rotations are made 
are determined by all possible pairs of reference axes. Of course, since the angles of 
rotation are determined subjectively, there is no reason to assume that the best 

overall transformation has been obtained after going through the j™J unique planar 

rotations Additional rotations in any plane may be made, if desired, and the new 
product of all the rotations becomes the final transformation. 

As in the case of three variables, the above rotations may be organized in a 
systematic manner. For example, when four factors are involved, the scheme out- 

h^n 1 ht Tab H In 2 may , bC followed - After each of the P arti al transformations has 
been obtained, the complete product matrix T should be determined and exhibited 

as part of the overall results of the factor analysis. In this way, anyone reading the 
results can verify the course from the initial solution to the final solution. Incidentally 
e matrix T should be checked for orthogonality in the sense of (12.25), and it may 
also serve to check the calculation of the final factor coefficients. * 


Table 12.2 

Scheme for Subjective Rotations in Four Space 


Old Axes 

Angle of 
Rotation 

New Axes 

Orthogonal to 

F t F 2 

YTi 

Z X F 4 

Y 2 Y 3 

z 2 y 4 

Z 3 Z 4 

0 l2 

013 

^14 

023 

024 

034 

y,y 2 

Z\ I 3 
Mi7 4 
Z 2 Z 3 

m 2 z 4 

m 3 m 4 

T3F4 

Y 2 F 4 

y 2 y 3 

m,y 4 

m,z 3 

m,m 2 


The 
of the 


practical operation implied by the complete graphical method is the build-up 

transformation matrix T as the product of a series of rotations in planes This 
m\ 


involves y two-dimensional plots and the matrix multiplications to get the inter¬ 


mediate coordinates, which is a very time consuming process. However, after many 
years of experience, these methods have proven quite dependable and satisfactory 
to many workers in factor analysis. y 
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During the last two decades many attempts have been made to lighten the burden 
of the rotational task, and to set forth objective principles for the preferred multip e- 
factor solution. Some of these principles for simple structure were presented m 6.2. 

In chapters 14 and 15 some of the recent developments in analytical solutions are 
presented. Before getting to these more sophisticated procedures, however, some 
indication is in order of alternatives to the complete graphical method. Such methods 
for getting a multiple-factor solution still require some graphical work to accomplish 
the rotations from the initial solution. 

Because of the dependence of the simple-structure concept on subjective analysis 
of graphs, Thurstone has probably done more work than anyone else toward simp i y- 
ing the chore of plotting graphs. As he states [477, p. 377]: “In the final acceptance 
or rejection of a simple-structure solution it is still the appearance of a set of graphs 
that determines the answer.” Thurstone was among the first to seek an alternative 
graphical method to the two-dimensional sections as described above. In 1938, he 
proposed a new method for rotation [471; 477, Chap XI], consisting of three- 
dimensional sections, drawn in the plane of the paper however. The effect of a three- 
dimensional configuration in a plane is accomplished by extending the test vectors 
so that they all have unit projection on the first axis of the initial solution. In addition 
to the “method of extended vectors,” Thurstone offers several other alternatives or 
rotation, devoting more than one-fourth of his text to the specific task of finding 
graphical substantiation for the simple-structure solution. 

In another attempt to reduce the computational labor of rotation to simp e 
structure, Zimmerman [549] proposes an ingenious scheme to eliminate the calcula¬ 
tion of the numerical values of the intermediate coordinates. His procedure makes 
use of the principle of projection of a point from one graph to another, and involves 
only an ordinary drawing board, T-square, and drawing triangle. The two factor 
axes which are to be rotated are initially on two separate plots, with one coordinate 
for any point (representing a variable) projected horizontally from one plot and the 
other coordinate projected vertically from the second plot onto a new graph. Thus 
any axis can be rotated with any other without actual calculation of the numerical 
coordinates of the points. Only after all rotations have been made (to satisfy the 
appearance of simple structure), need one read and record the coordinates of the 
points with respect to the final axes. As a result of many rotations, and the inaccuracies 
of plotting, the basic relationship (2.50)—product of the factor matrix by its transpose 
—for the final rotated factors may not agree with that of the initial factors. 

Another labor-saving approach is through the use of mechanical or electrical 
devices. One such machine, known as the “Factor Matrix Rotator”—conceived by 
Richard Gaylord and described by Harman and Harper [206] is located in 
Washington, D.C., in the Army Personnel Research Office. This machine is of the 
analog type, that is, the readings are in the nature of displacements along a scale so 
that the figures have to be estimated from calibrations on the scale rather than read 

as precise digital values. . ,. . 

The initial factor weights are set into the machine by means of a series ot dials; 
then the positions of the points representing the test are viewed as points of light on 
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a scope (cathode ray tube). At the beginning of the work, the dials representing the 
transformation matrix are set for the identity transformation so that the researcher 
can first view the plot of the points as they appear in the initial factor solution. The 
axes are rotated by a simple manipulation of a dial; when desirable positions of the 
axes are located, the researcher can cause the appropriate elements in the trans¬ 
formation matrix to be reset to take the new positions into account. 

Of course, the plots of points viewed on the scope are for a pair of axes at a time. 
An immediate advantage over hand methods arises from the fact that while a decision 
is being made regarding the rotation of a particular pair of axes, it is possible to view 
each of the remaining factor axes, in turn, with each of the two under consideration. 
The usual procedure employed by researchers using this machine is to view the plots 
in relation to all possible pairs of axes and on a schematic chart to note those planes 
in which rotation is most desirable and those that might be considered in order to 
keep the number of rotations to a minimum. When final decisions have been made 
about the location of factor axes to exhibit simple structure, the elements of both 
the transformation matrix and the final factor matrix can be obtained as fast as the 
experimenter can turn a dial and read a scale, since the computations are done 
electronically. 

The Factor Matrix Rotator is designed primarily to handle problems of up to 50 
tests and 12 factors and involving orthogonal rotations. Nonetheless, the machine 
has been applied to problems involving up to 130 tests and 24 factors. Further, it 
has been employed in connection with oblique transformations. However, applica¬ 
tions beyond the basic capacity of the Rotator necessarily involve extensive additional 
computations off the machine. 

Before leaving the subject of alternatives to the complete graphical method, it 
should be remembered that all of these methods arose as a result of the simple 
structure concept, and the lack of any real objective means of arriving at such a solu¬ 
tion. The work of Paul Horst [251] was among the very earliest in which the trans¬ 
formation from an arbitrary factor matrix into a simple-structure matrix was almost 
completely objective. The criterion he employs is that of maximizing the ratio of the 
sum of squares of the “significant” factor loadings to the sum of squares of all the 
loadings. A variant of this method is proposed by Tucker [489], in which he gets the 
solution for all factors simultaneously, using graphs for the selection of subgroups 
of tests. A detailed discussion of analytical methods for approximating simple 
structure is deferred to chapters 14 and 15. 

12.4. Numerical Illustrations of Orthogonal Multiple-Factor Solutions 

In obtaining an orthogonal multiple-factor solution, the reference axes are chosen 
in conformity with the discussion in 6.2. It is first necessary that the initial solution 
shall satisfy the criteria of the linear factor model (2.9), an orthogonal frame of 
reference, and parsimony of factors. Then an orthogonal rotation of such an initial 
pattern, in its common-factor space, will preserve these properties. The purpose of 
the transformation is to obtain a final pattern which also satisfies the criteria of low 
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complexity, level contributions of factors, and hyper-planar fit; or more specifically, 
approximates the simple structure principles of 6.2. 

Although any initial solution with uncorrelated factors can be used to rotate to a 
multiple-factor pattern, the centroid solution has usually been employed in the past. 
The procedure is begun by plotting the points representing the variables in the plane 
of the first two initial factors F x and F 2 . Since in an initial solution of the centroid 
form the second factor has both positive and negative weights, the points will lie 
in the first and fourth quadrants. The first rotation is then made through an angle 6 12 
such that all variables will have positive projections on the new axes Y U Y 2 . Usually 
the angle 0 l2 so selected will be about -45°. Then, by applying equations of the type 
(12.13), the coordinates with respect to ^ and Y 2 are obtained. The next rotation is 
made in the plane of Yj and F 3 as indicated in the scheme exhibited in Table 12.2. 
Again, the angle 0 13 is obtained by inspection of the graph. The new reference axis 
Z x should pass near a cluster of points while at the same time the other (orthogonal) 
axis 7 3 also should be near some other points. The variables represented by the first 
cluster of points will have high positive weights for the Z x factor, and the variables 
given by the second set of points will have low weights for this factor. Additional 
rotations may be made according to the outline given in Table 12.2. It will be evident 
from the following examples that the above procedure yields a final solution which 
approximates simple structure. 

1. Eight physical variables.—The first illustration of a multiple-factor solution is 
based upon the two minres factors for the eight physical variables, given in Table 9.3. 
The coefficients in this pattern are the coordinates, with respect to the two minres 
axes, of the eight points representing the variables. The plot of these points is given 
in Figure 12.2, in which it is apparent that the points fall into two distinct clusters. 
If two lines, and Y 2 were passed through these clusters of points, they would produce 
excellent geometric fit to the data. Such axes, however, are not orthogonal and there¬ 
fore not appropriate for the present method. 

If one axis is passed through a cluster and the other orthogonal to it, the standard 
of uncorrelated factors is met, but other standards are not well satisfied. Thus if an 
axis Mi is passed through the first four points, the other axis M' 2 will be far removed 
from the second cluster. The coefficients of such new factors would have the following 


properties: 





Variables 

Coefficients of M\ 

Coefficients of M' 2 


1, 2, 3, 4 

Very high 

Near zero 


5, 6, 1, 8 

Fairly high 

High 


The first four variables would be of complexity one while the last four would be of 
complexity two. Variables 5-8 would not satisfy the criterion of low complexity for 
the present example involving only two factors. 

In an attempt to meet the basic standards for a multiple-factor pattern, the axes 
M x and M 2 are selected so as to be about equally removed from the two clusters of 
points. By inspection the resulting angle of rotation is taken to be 0 12 = —42°. 
Another worker, of course, might select a slightly different angle. The necessary 
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trigonometric functions are 

cos( —42°) = .7431, sin( —42°) = -.6691. 
Substituting these values in equations (12.13), there results 


(12.27) 


b jl = .7431 cij! — .6691a j2 , 


b j2 = .6691 a jl + .7431 a j2 . 

The transformation (12.27) may also be written in the equivalent form 

.7431 .6691" 


(12.28) 


B 


.6691 .7431 


where A is the pattern matrix of Table 9.3 and B is the final pattern which is presented 
in Table 12.3. 
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Table 12.3 

Multiple-Factor Solution for Eight Physical Variables 
(Initial solution: Minres, Table 9.3) 


Variable 

j 


m 2 

hi 

1. Height 

.853 

.332 

.838 

2. Arm span 

.906 

.261 

.889 

3. Length of forearm 

.874 

.237 

.820 

4. Length of lower leg 

.846 

.302 

.807 

5. Weight 

.175 

.926 

.888 

6. Bitrochanteric diameter 

.140 

.788 

.641 

7. Chest girth 

.082 

.760 

.584 

8. Chest width 

.216 

.667 

.492 

Contribution of factor ( V p ) 

3.132 

2.827 

5.959 


The pattern of Table 12.3 may be examined now to see how well it conforms to 
the standards for a multiple-factor solution. Since there are no sampling error 
formulas for this type of solution, the analyst usually must set some arbitrary level 
of significance. The present example is based on a large number of observations 
(.N = 305), and hence even relatively small coefficients may not be insignificant. 
As a rough approximation, a standard error of .066 is obtained from Table B in the 
Appendix, for an N = 305 and an average correlation of .355 in the original data. 
If this standard error were applied to the coefficients in Table 12.3, even the smallest 
value might be judged significant. While the foregoing test is not strictly applicable 
in the present case, it nevertheless throws some doubt on the insignificance of the 
small values. The multiple-factor pattern for these data is therefore not a good 
example of this type of solution. 

2. Thirteen psychological tests— The next example is based upon the centroid 
pattern, given in Table 8.30, for the thirteen psychological tests, involving three 
factors. The transformation to a multiple-factor solution in this case is made in 
accordance with the scheme outlined in 12.3 for rotations in ordinary space. In 
Figure 12.3 the thirteen points are plotted in the plane of the first two centroid axes. 
The procedure in the present example differs somewhat from that employed in the 
preceding one. The first rotation is made in order to accomplish a leveling of the 
contributions of the first two factors. An angle 0 12 = 50° is selected by inspection 
for this purpose. In such a rotation all points have negative projections on the second 
axis, designated as Y' 2 . This rotation is immediately followed by reflection of the 
second axis, namely, 

y 2 = -r 2 , 

so as to yield positive coordinates. Although many points will have appreciable 
loadings for both and Y 2 , this can be adjusted by subsequent rotations. 
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Fig. 12.3. First rotation for thirteen psychological tests 


The resulting matrix of transformation, including the reflection of the temporary 
second axis, takes the form: y 

.6428 .76601 

T = 

.7660 —.6428 

Then, postmultiplymg the first two columns of Table 8.30 by T 12 , yields the first 
two columns of Table 12.4. The numerical calculations may be checked at this 
stage. The new factors Y u Y 2 , and the factor C 3 are mutually orthogonal, and their 
total contribution should be the same as that of the original system C, C, C, 
Thus, 3 ' 

2.981 + 3.031 + .954 = 4.620 + 1.392 + .954 = 6.966. 

The next rotation is made in the Y lt C 3 -plane. The plot of points is presented in 
igure 12.4, m which the coordinates are obtained from the first column of Table 12.4 
and the third column of Table 8.30. In this transformation the first multiple-factor 
axis is selected. Therefore, it is important that this axis pass near a cluster of points 
and also be about 90° removed from a number of other points. To satisfy these require¬ 
ments, the angle 0 13 = 27° is chosen. In this case one of the axes is reflected again to 
obtain positive coordinates. The transformation matrix in this plane is giveh by 

.8829 .4695 ~ 

T = 

13 |_- 469 5 —.8829_' 

The ^suiting values of M x and T 3 are recorded in the appropriate columns of 
lables 12.4 and 12.5. 
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Fig. 12.4 FlG - 125 

Figs. 12.4-12.5—Second and final rotations for thirteen psychological tests 

The final rotation is made in the Y 2 , Y 3 - plane. For this transformation the last two 
multiple-factor axes are selected so as to pass as near as possible to clusters of points. 
Thus M 2 passes near points 10, 11, 12, and 13, while M 3 lies close to the points 
1, 2, 3, and 4 when the angle of rotation is taken to be 0 23 = — 23 . The third trans¬ 
formation matrix is given by 

.9205 .3907~ 

T = 

23 [-.3907 .9205_ 

Table 12.4 

Intermediate Coordinates 
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Table 12.5 

Multiple-Factor Solution for Thirteen Psychological Tests 


(Initial solution: centroid. Table 8.30) 


Test 

j 

Verbal 

Speed 

m 2 

Spatial 

Relations 

m 3 

Communality 

hj 

1. Visual perception 

.096 

.248 

.706 

569 

2. Cubes 

.102 

.089 

.425 

199 

3. Paper form board 

.136 

-.011 

.602 

.381 

4. Flags 

.193 

.121 

.516 

318 

5. General information 

.702 

.324 

.243 

.657 

6. Paragraph comprehension 

.719 

.214 

.300 

.653 

7. Sentence completion 

.778 

.243 

.237 

.721 

8. Word classification 

.562 

.291 

.372 

.539 

9. Word meaning 

.791 

.198 

.231 

.718 

10. Addition 

.119 

.757 

-.100 

.597 

11. Code 

.109 

.651 

.161 

.462 

12. Counting dots 

-.083 

.703 

.210 

.545 

13. Straight-curved capitals 

.070 

.619 

.469 

.608 

Contribution of factor (V p ) 

2.670 

2.292 

2.005 

6.967 


The coefficients of the factors M 2 and M 3 are recorded in Table 12.5. The complete 
transformation matrix which carries the original factor weights into the final ones 
may be summarized, according to (12.24), as follows: 




.5675 

.5872 

.5771 

(12.29) 

T = T 12 T 13 T 23 = 

.6763 

-.7322 

.0799 



_.4695 

.3499 

-.8127. 


The numerical check on the total contribution of a factor system, which can be made 
after each rotation, may again be employed on the final set of factors. Thus the 
numbers appearing in the last line of Table 12.5 sum to 6.967, which is the same as 
the total contribution of the original centroid factors. 

The multiple-factor pattern of Table 12.5 satisfies the standards listed above. The 
contributions of the factors are relatively level in comparison with other types of 
preferred solutions. The criteria of low complexity and good geometric fit also appear 
to be satisfied. For the present sample (N = 145) it is judged that a factor coefficient 
of two-tenths is insignificant. In general, the solution tends to satisfy the criteria for 
simple structure set forth in 6.2. The above solution thus affords a good illustration 
of the multiple-factor type. 

In naming the multiple factors, those variables having definitely significant weights, 
sa y> greater than four-tenths, are considered (indicated in bold face type in Table 12.5). 
The subgroups of tests identifying the multiple factors are the same as those employed 
in naming the first three group factors in the bi-factor solution of Table 7.6. the same 
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names are then assigned to the multiple factors, as indicated in Table 12.5. It may be 
noted that each test is essentially a measure of only one of these factors, except 
Test 13, which appears to be reasonable inasmuch as the test is a measure of speed 
of perception of simple geometric forms. 


12.5. Other Problems of Relationships between Factor Solutions 

It should be perfectly clear that many different factor solutions are possible for a 
given set of variables for a single sample of individuals. Not only is this evident 
from the several techniques for factoring a correlation matrix, developed earlier in 
the text, but more particularly from the considerations of transforming one solution 
into another (in this chapter and the three following ones). The relationships among 
any two solutions for a given body of data have been shown in 12.2 above. 

A procedure developed by Tucker [496] for factor analysis of a three-mode matrix 
(individuals measured on a number of variables on several occasions) may be con¬ 
sidered as an alternative to the comparison of ordinary factor solutions (for a group 
of individuals measured on a number of variables) obtained on several different 
occasions. 

There are two additional problems that seem to be of more importance to the 
development of content-area (e.g., psychological) theories based upon factor analysis. 
Suppose the same variables (or, at least, several variables common to the two 
batteries) are measured for two distinct groups of individuals. What can be said about 
the resulting factors used in the description of the identical variables? The other 
problem is concerned with two distinct sets of variables (designed to measure the 
same traits) for which factor analyses are obtained for a single sample of individuals. 

In each situation two sets of factors are obtained and the problem is to determine 
the extent of similarity or dissimilarity between them. Of course, factorial similar¬ 
ity is a matter of degree rather than coincidence. While the ultimate objective in 
psychology may be the formulation of some theories on the invariance of factors, 
the treatment in the present text is limited to an exposition of several measures for 
the degree of agreement between factors obtained in different solutions. 

Before considering these measures, it might be well to point out some of the 
principal work on the “invariance” problem. Perhaps the point of departure for the 
present-day theoretical work in this area was the demonstration by Ahmavaara [3] 
of the invariance of a factor solution upon the selection of samples (satisfying certain 
conditions) from a population. Several other works (e.g., [24], [73], [79], [255]) 
have made some contributions to this area, but two papers by Meredith [365, 366] 
are of special significance. In the first of these, he shows that there exists a factor 
pattern for a given set of variables that is invariant with respect to sampling from a 
parent population, provided the “selection does not occur directly on the observable 
variables and does not reduce the rank of the system.” In the second paper, Meredith 
develops procedures for transforming factor solutions based on different populations 
to conform to a single “best fitting” factor pattern, and illustrates these procedures 
with examples involving four groups of individuals. 
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1. Fixed variables, different samples. —Since factor analysis has been employed 
as a tool in the development of psychological theories, dealing with cognitive, 
educational, and temperamental traits, considerable interest has beep directed 
toward the “identification” of factors from one study to another. Ideally, this problem 
could be solved m a manner similar to the establishment of standards for weight or 
mass m the physical sciences, as suggested by Mosier [372] and Young and House¬ 
holder [546]. Thus, in the field of psychological aptitudes, a “set of r independent 
tests is to be assigned any non-singular r x r array of factor loadings once and for 
all, and then future testing is to be linked to this standard by always carrying a set of 
r independent tests (or individuals) from each experiment to the next...” [546, p. 51]. 
It is not very likely that psychologists will take this approach very seriously. Instead, 
they are more likely to appeal to statistical criteria for a measure of coincidence or 
agreement of factors obtained in one study with those of another. 

Attempts to link factors of separate investigations go back to the earliest days of 
the development of factor analysis. Generally, rough methods of inspection and 
personal impressions are offered as the basis of the “identification”. A good example 
of this approach is the report of Zachert and Friedman [548], in which they “draw 
inferences concerning the stability of the factor pattern” for postwar and wartime 
samples by noting that a given factor tends to involve the same variables with loadings 
of .30 or greater m each of the four samples. In notes by Barlow and Burt [24] and 
Leyden [341], attention is called to a variety of measures that have been proposed 

over the years, and to the divergent values that can be obtained with the different 
indices. 

Since the variables are assumed to be the same in the two studies, a common 
statistical measure the root mean square —might be used to determine the extent 
of agreement between corresponding factor weights. Such an index for comparing 
lactor p of study 1 and factor q of study 2 may be put in the form: 

(12.30) (p — q) rms — lY, ( x a jp — 2 aj q ) 2 /n, 

v j= i 

where the prefixes “1” and “2” are employed to distinguish the factor weights in the 
two studies. Of course, this is a very simple kind of index from which it might be 
difficult to ascertain what is really “good agreement”, knowing that perfect agree¬ 
ment would yield a root mean square of zero. 

The simple expedient of employing an index roughly resembling a coefficient of 
correlation has been used by several investigators to compare the weights of a fixed 
set of variables on two factors (presumed to be identical, or at least, suspected to 
ave a high degree of relationship). Burt [58, p. 185] proposes as a proportionality 
criterion the “unadjusted correlation” between the two sets of factor coefficients.* 
Tucker [491, p. 43] develops a coefficient of congruence, to study the agreement 

„ * Act ually, Burt compares a set of factor coefficients of twelve temperamental traits for a 
general emotionality” factor with a teacher’s set of independent gradings for “general emo- 
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between factors in two studies, which is precisely the same as Burt’s unadjusted 
correlation. Again, Wrigley and Neuhaus [541] present the same formula for measur¬ 
ing the degree of factorial similarity. In the notation introduced in the last paragraph, 
the coefficient of congruence is simply 


(12.31) Vm ~ 


While this formula is similar in form to the product-moment coefficient defined 
in (2.7), it certainly is not a correlation—the a’s are not deviates from their respective 
means and the summations are over the n variables instead of the number of indi¬ 
viduals. If the n variables of study 1 are identical with those of study 2 the application 
of formula (12.31) is straightforward to the corresponding numbers m the two 
columns representing the factors p and q. If only a subset of the variables are common 
to the two studies, the summations in (12.31) must be understood to apply only to 
such variables. Since a small number of variables usually will be common to two 
studies, it is evident that the coefficient of congruence will be high so long as there 
are factor weights with like algebraic signs in the two instances. The coefficient of 
congruence can range in value from 4-1 for perfect agreement (or — 1 for perfect 
inverse agreement) to zero for no agreement whatsoever. 

To illustrate the calculation of (12.31), the data of Table 12.1 will be employed. 
While it is convenient to use the data at hand, it should be clear that a coefficient 
of congruence would not ordinarily be of interest for two different factor solutions 
of a set of variables for the same individuals. Considering the two solutions A and B 
as the studies “1” and “2,” respectively, the only new calculations required are the 
denominators of (12.31), and the quotients, to get all nine coefficients of congruence; 
the numerators are the elements of B'A in Table 12.1. The results are shown in Table 
12.6. Of course, the coefficients of congruence confirm the relationships among these 
factors discussed in 12 . 2 . 




Table 12.6 

Coefficients of Congruence between Oblique Primary 
and Orthogonal Multiple Factors 
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In situations where factors from two studies can be matched visually, and the 
number of variables common to the studies is small, it can be expecteid that the 
coefficient (12.31) will be very high. Tucker [491] analyzes two studies—one in¬ 
volving 18 variables for a sample of Naval Recruits and the other involving 44 
variables for a sample of Airmen and Soldiers—in which 10 variables are common 
and the six factors of the smaller study are matched with six out of the twelve factors 
of the larger. He accepts coefficients ranging from .999984 down to .939811 as defining 
congruent factors, but rejects a value of .459717 as “definitely low so that this factor 
will not be considered as a congruent factor” [491, p. 19]. 

It is generally recommended that each factor of one study be compared with all 
the factors of the other study, and be paired with the one with which it has (he highest 
coefficient of congruence. This implies a considerable amount of work, for which 
high-speed electronic computers have been applied. While the statistical relation¬ 
ships between factors of different studies have certain intrinsic values, a number of 
workers (e.g., Cattell, Tucker, Wrigley) are even more concerned with the develop¬ 
ment of psychological theories through these means. They would employ the indices 
of proportionality as stepping stones to an objective basis for the matching of factors 
and the subsequent rotation to identical positions of the factor axes in the two studies. 
Tucker, for example, defines a congruent space between two studies as that spanned 
by their congruent factors, where the two matrices of these factor loadings are 
“considered as congruent if they are generally similar, with only relatively small 
random differences” [491, p. 18]. He then discusses the degree of confidence a psychol¬ 
ogist might have in the representation of the same mental function by congruent 
factors in two studies. 

Several other indices are proposed by Pinneau and Newhouse [391] for the com¬ 
parison of factors based on a fixed set of variables oil the same or different samples. 
Their “coefficient of invariance” and “coefficient of factor similarity” involve the 
correlation between the measurements of the factors (see chap. 16) for the individuals. 
This procedure follows the work of Horst [254, 255]. 

The measure of consistency of factors, for a fixed set of variables, from sample to 
sample would seem to be a classical problem in the theory of statistical sampling. 
However, little progress has been made toward the solution of the sampling problem 
m factor analysis which is the subject of this section.* Therefore the empirical 
approach, employing indices of proportionality of factors, which is suggested in this 
section seems not inappropriate at this time for the “identification” of factors across 
different studies. 

2. Different variables, fixed sample— Another situation where it is of interest to 
compare factors from different studies is when the sample of individuals is fixed 
but the sets of variables in two studies are different. While a statistical theory of the 
stability of factors under sampling variation of variables may be developed some 
day, the present approach to the problem is along empirical lines. 

* Some sa ™P H ng theory in factor analysis regarding the number of common factors was 
covered in y.5 and 10.4. 
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An important consideration to the psychologist when making a choice out of 
several solutions is the question of “invariance” of the particular solution. Invariance 
as used in the context of rotation to a preferred multiple-factor solution requires 
that the factorial description of a variable must remain the same when it is part of 
one battery or another which involves the same common factors. In this quest for 
basic traits in psychology, Tucker adds the requirement that "the scores for indi¬ 
viduals on a factor should remain invariant as the individuals are tested with different 
batteries which involve the factor ” [495, p. 112]. While the varimax method (see 14.4) 
is proposed by Kaiser as the factorially invariant solution [293, pp. 193-98], the 
problem considered in the present section is concerned more with the measure of 
stability of factors when different variables are used. 

Wrigley and Neuhaus [541] propose what appears like the most natural method 
for matching factors determined from two different sets of variables for the same 
sample of individuals. In each study the measurements of the respective factors for 
the individuals can be obtained by the methods of chapter 16. Then, for a given 
sample of individuals, the measurements of a factor p from one set of variables may 
be compared with the measurements of a factor q from the other set of variables m 
a manner similar to that above. As before, a coefficient of congruence can be defined 
for measuring the degree of factorial similarity, namely. 


(12.33) 


* 


I i F pi - 2 F qi 

i— 1 


pq 


N 

£ 

i - 1 


£ M £ 2F , 


i = 1 


where x F pi is the factor measurement for individual i on a factor p for study 1, with 
similar interpretation for measurements of factor q of study 2. The coefficient of 
congruence (12.33) may range in value from —1 to +1 just as the index (12.31). 

Assuming m 1 and m 2 factors in the two studies, there is possible a total of m 1 x m 2 
matchings of each factor of one study with every factor of the other. A given factor 
of one study may be said to be “matched with” or “congruent to” that factor of the 
other study with which it has the highest coefficient of congruence (12.33). If there 
should be any conflicts, Wrigley and Neuhaus [541] propose that the factors be paired 
in such a way as to make the sum of the indices (12.33) a maximum. 

While the foregoing index provides a means for comparing factors obtained from 
different sets of variables, there is considerable interest among psychologists in 
bringing the factors from such studies into confluence. Specific work in this direction 
has been carried out by Wrigley and Neuhaus [541], Tucker [495], and Meredith [366], 
but is outside the scope of the present text. 
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Oblique Multiple-Factor Solutions 


13.1. Introduction 

In the early history and development of factor analysis, solutions in terms of 
uncorrelated factors were tacitly assumed to be the only type permissible. Then, in 
the 1940’s the notion of correlated factors not only became acceptable, but frequently 
preferable to uncorrelated ones. For the field of psychology, Thurstone states the 
case as follows [477, p. vii]: “If we impose the restriction that the reference frame 
shall be orthogonal, then we are imposing the condition that the factors or para¬ 
meters shall be uncorrelated in the experimental population or in the general 
population.... It seems just as unnecessary to require that mental traits shall be 
uncorrelated in the general population as to require that height and weight be un¬ 
correlated in the general population. This admission of oblique axes has also been 
debated and is not yet generally accepted by students of factor theory.” 

The development of the different types of factor solutions in this text has been 
made systematically, including both orthogonal and oblique. An enumeration and 
properties of different types of solutions are presented in chapter 6, with methods 
for their calculation treated in subsequent chapters. The first oblique solution is 
introduced in chapter 11, although an associated orthogonal solution is also derived 
(to be used either as the final solution or as the starting point for the rotational 
process). 

In the present chapter, any assumption of uncorrelated factors is discarded and 
procedures leading to oblique solutions are considered. It is clear that a certain 
simplicity of interpretation is sacrificed upon relinquishing the standard of orthog¬ 
onality. This disadvantage may be offset, however, if the linear descriptions of the 
variables in terms of correlated factors can be made simpler than in the case of 
uncorrelated ones. Generally this is possible. Hence the oblique pattern proposed 
may not merely be an oblique multiple-factor solution satisfying the criteria for 
simple structure, but may actually approximate the uni-factor form. 
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An oblique solution of the type indicated may be obtained directly from clustering 
of variables as in chapter 11. The purpose of the present chapter is to provide a method 
of analysis when the direct method of chapter 11 is not applicable (although in such 
event the oblique solution is not very adequate). This method consists of the trans¬ 
formation of some initial orthogonal pattern to an oblique solution. The com- 
munalities of the variables, which are determined by the common-factor space of 
the preliminary solution, remain invariant under this transformation. Therefore, 
the entire development will be made in the common-factor space. 

The geometric setting for the oblique form of solution is presented in 13.2. This is 
followed by a detailed outline in 13.3 of the procedure for getting an oblique “primary 
factor” solution. Then an alternative approach in terms of “oblique reference axes” 
is developed in 13.4. The distinction between the two types of oblique solutions is 
presented in 13.5, where existing ambiguities in the literature on oblique solutions 
are clarified. 

13.2. Geometric Basis for an Oblique Solution 

In 2.5 the definitions of factor patterns and structures were formulated. When 
the factors are uncorrelated these concepts are identical. Therefore, in the foregoing 
text (with the exception of chapter 11), no distinction was necessary, and the term 
“pattern” was used synonymously with “solution.” When correlated factors are 
employed, however, a solution comprises both the pattern and structure. The 
distinction in this case can best be shown geometrically. 

A pattern, in terms of common factors only, may be represented as follows: 

(13.1) z'j = bj X T X + b j2 T 2 + • • • + b jm T m (j = 1,2, • • •, n), 

where b’ s are employed to denote coefficients of correlated factors. As pointed out 
in 4.10, the double prime denotes a variable projected into the common-factor 
space. Since the analysis of this chapter is entirely in the common-factor space, it 
will simplify matters to drop the primes and write the foregoing equation: 

(13.2) Zj = bj iTi + b J2 T 2 + • • • + b jm T m (j = 1,2, • • •, n). 

The coefficients may be considered as the coordinates of a point Pj with respect to 
the factor axes. This interpretation may be made whether the factors are represented 
by orthogonal or oblique axes. 

For the case of two factors these ideas may be illustrated by Figure 13.1. When 
the correlation between the factors 7\ and T 2 is known, the unit vectors representing 
T x and T 2 are separated by an angle 0 12 = arc cos r TiTl . The oblique reference system 
is thus determined. Any variable Zj is represented by a vector, OP, whose length and 
direction are determined by its coordinates. Again, for the sake of simplicity, the 
point representing any variable Zj is designated P instead of Pj, and its coordinates 
are shown as (b u b 2 ) instead of (b jU b j2 ) with respect to the two oblique axes T x and 
T 2 . From the definition of general Cartesian coordinates, given in 4.8, it may be noted 
that the coordinates b u b 2 are given by the line segments OQ and OR, respectively. 
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For the hypothetical variable Zj in Figure 13.1, the first coordinate is positive and 
greater than unity, while the second coordinate is negative and less than unity. 

The length of the vector corresponding to Zj can be determined by means of 
formula (4.39), which in this case may be written as follows: 

D 2 {OP) = £ £ b jp b jq cos9 pq 
(13.3) P- 14-1 

= bjibji cos 0ii + bj 1 b j2 cos 0 12 + b j2 b j)L cos 0 21 + b j2 b j2 cos 0 22 . 

In this formula each of the angles 0 n and 0 22 is equal to zero, and 0 12 is the angle 
between the reference axes. Hence cos0 u = cos0 22 = 1 and cos0 12 = r T T . The 
expression (13.3) then reduces to 12 

( 13 ' 4 ) *> 2 m = b ]i + bj 2 + 2 b n b j2 r TlT2 

The right-hand member is the communality h) of the variable Zj . The length of the 
vector is then equal to the square root of the communality, namely, 

( 13 -5) D(OP) = hj. 

The geometric interpretation of the correlation of a variable with a factor will be 
given next. Let the angle between the vector corresponding to Zj and the reference 
vector Ti be denoted by (f>. Also let the projections of the end point P upon the Ti 
and T 2 axes be M and N, respectively, as indicated in Figure 13.1. From the right 



Fig. 13.1. Distinction between coordinate and correlation in oblique reference system. 


triangle OMP, it is apparent that 


COS (f) = 


D(OM ) 
D(OP) ‘ 


This formula reduces to 
(13.6) 


D(OM) = hj cos (f> 
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upon making use of (13.5). Employing the expression (4.56) for a reproduced correla¬ 
tion in the common-factor space, formula (13.6) can be simplified further. In the 

present example, one variable is Zj and the other is T] so that their scalar product, 

according to (4.56) is 

(13.7) r' jTl = hjh Ti cos </>. 

Since the length or “communality,” of any factor is unity, this expression may be 
written in the form 

(13.8) r jTl = hj cos </>. 

In this formula, and in all other representations of correlations of variables with 
factors, the prime is dropped for simplicity. Substituting (13.8) into (13.6), the pro¬ 
jection of a vector upon a reference axis may finally be expressed as follows: 

(13.9) D(OM) = r JTl . 

In a similar manner it can be shown that the projection, D(ON), of the vector Zj on 
the T 2 axis is the correlation of the variable with the second factor. Of course, the 
correlation between two factors is also given by the projection of either reference 
vector upon the other. 

By referring to Figure 13.1, the distinction between a coordinate and a correlation 
can be seen clearly. The coordinates may be positive or negative and may be greater 
than one. A correlation coefficient also may be positive or negative but can never 
exceed unity. It may also be observed that the coordinates and correlations approach 
coincidence as the reference vectors approach orthogonality. 

A complete solution involving correlated factors must consist of a pattern and a 
structure. The factor pattern may be exhibited as in equation (13.1) or, more com¬ 
pactly, in a table giving the coefficients of the factors. The structure usually is presented 
in tabular or matrix form. In addition to the pattern and structure, an oblique solution 
should include a table of intercorrelations of factors. 

13.3. Computing Procedures for Oblique Primary-Factor Solution 

The form of oblique multiple-factor solution developed in this section leads to a 
set of factors which have been called “primary” by Thurstone [477, chap. XV], when 
considered in the context of psychological traits. Aside from the particular applica¬ 
tion, and although arrived at somewhat differently, it is convenient to designate the 
oblique solution of this section as “primary-factor” to distinguish it from another 
form of oblique solution considered in the next section. 

Starting with any initial orthogonal pattern, it is possible to transform it into an 
oblique primary-factor solution which, more or less, satisfies the conditions of simple 
structure. The procedures outlined in this section are intended for hand methods, 
and therefore would be useful only for small problems. However, the heuristic value 
goes far beyond the immediate applications. 

1. Initial orthogonal pattern.—Any preliminary orthogonal solution (e.g., 
principal-factor, minres, maximum-likelihood) may be employed. The two minres 
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factors for the eight physical variables, given in Table 9.3, serve as the starting point 
Irom which an oblique primary-factor solution will be derived. 

2. Reduced pattern.—Since the coordinate axes representing the oblique factors 
will be made to pass through the centroids of clusters of variables, it is necessary to 
identify such groupings, say, G p (p = 1 , 2, • •., m). These groups of variables may be 
established from the earlier analysis which led to the orthogonal solution, or they 
may be determined by the method of B-coefficients (see 7.4); or they may be found 
by inspection of the initial pattern or graphs of the points. Just by inspection of 
Table 9.3, it is evident from the algebraic signs of the coefficients that the plot of these 
eight points with respect to the two minres axes would lead to a cluster in the fourth 
quadrant for the first four variables, and a cluster in the first quadrant for the last 
four. The actual plot of these points in Figure 13.2 bears this out. 

To determine the directions of the oblique reference vectors, lines may be drawn 
by inspection from the origin through the clusters of variables. The angles which these 
new axes make with the old may be measured, and the transformation may thus be 
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determined. Such a procedure is completely subjective, and may be replaced by the 
following operational method. 

A composite variable v p is assumed through each cluster of n p variables in group 
G . This type of variable was designated T p in 11.3, but it will be found more con¬ 
venient in the present development to reserve T p for the standardized form of the 
oblique factor. For the example, the two composite variables are defined by 

(13.10) v x = Zi + z 2 + z 3 + Z 4 and v 2 = Z 5 + Z 6 + Z 7 + Z 8- 
The variance of a composite variable is given by 

(13.11) < = h keG p\ 

in which there are n p self-correlations of unity. Applying this formula to the two 
composite variables in (13.10), employing the correlations from Table 5.3, the 
resulting standard deviations are found to be 

(13.12) s Vl = 3.7465 and s„ 2 = 3.4117. 

It is now possible to express the composite variables in terms of the factors of the 
initial solution. The coefficients in such linear expressions are also the correlations 
of the composite variables with the factors (the factors in the initial solution being 
uncorrelated). Since the factors are standardized variables, the correlation of any 
composite variable v p with a factor F is given by 

(13.13) r Vp F = I] ( r jF j J G G p )/s Vp , 

where the individual correlations of the variables with the factor are, of course, the 
coefficients in the initial orthogonal pattern for the variables comprising the group 
G The correlation for the first composite variable of the example with the first 
minres factor is calculated as follows: 

r vlFl = (.856 + .848 + .808 + .831)/3.7465 = .8923, 

where the values in the numerator are taken from Table 9.3. In a similar manner, 
the correlations of each of the composite variables with each of the minres factors 
can be obtained, and the results presented in the form of a reduced factor pattern, 
as follows: 

u x = Vl /s vi = .8923 F, - .3969F 2 , 

(13 ' 14) u 2 = v 2 /s n = .7495 F, + .5639F 2 . 

The standardized form of a composite variable v is indicated by u. The reduced 
pattern equations (13.14) have the same properties as those for individual variables. 

3. Transformation matrix.—Since the oblique primary-factors are represented by 
the coordinate axes passing through the points corresponding to the composite 
variables u p , the reduced pattern provides the basis for determining the transforma¬ 
tion matrix. The coefficients of the reduced pattern give the coordinates of the 
composite points, and division by the respective distances of these points from the 
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origin produces the direction cosines of the lines through them. These lines are 
precisely the oblique factors T p and hence these direction cosines with respect to the 
initial reference system constitute the elements of the transformation matrix. 

For example, the distance of the first point from the origin is 


/ (.8923) 2 + (-.3969)' 


according to (4.12). Then the direction cosines of T x with respect to the F t and F 2 
axes are given by: 


hi = .8923/.9766 = .9137 and t 21 = -.3969/.9766 = -.4064. 

The direction cosines of the T 2 axis with respect to the orthogonal axes iq and F 2 
are determined in a similar manner. The resulting transformation matrix may be 
written as follows: 


(13.15) 


hi hi .9137 

hi ^ 22 - -—.4064 


.7991 

.6012 


While this matrix is written specifically for the illustrative example, the ideas and 
notation are generalizable to any number of factors. The symbol T is used in general 
for the transformation matrix for any number of axes. 

4. Factor correlations.—After the direction cosines of the oblique reference 
vectors have been obtained, the correlations among such factors can be determined. 
The sum of paired products of the direction cosines of two vectors gives the cosine 
of the angle between them, according to (4.47); and, by (4.48), this cosine is equal to 
the correlation between the two variables represented by the vectors. Hence, the 
correlation between T x and T 2 in the present example is 


r TlT2 = .9137(.7991) - .4064(.6012) = .4858. 


The self-correlations, or variances, of the factors may be calculated in the same 
manner. These sums of squares by columns in the transformation matrix must be 
unity, and thus provide a check on the calculation of the elements of the transforma¬ 
tion matrix. 

More generally, the scalar product between any pair of vectors T p and T q is equal 
to their correlation according to (4.56) and (4.54). Such scalar products can be 
designated in matrix form, as follows: 


(13.16) 


$ = T'T. 


From the correlation between factors the angle of separation of the reference 
vectors can be determined if desired. In the present case this angle is given by 


0 12 = arc cos .4858 = 61°. 


5. Factor structure.—The projections of the vectors representing the variables 
upon the oblique primary-axes can now be determined. As indicated in (13.9), such 
projections are the correlations of variables with the factors, i.e., the elements of the 
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factor structure. In order to derive formulas for the structure values, Figure 13.3 
has been constructed. In this figure the oblique axes T x and T 2 have been taken in the 
first quadrant of the F u F 2 reference system only to simplify the development, the 
results being the same regardless of the quadrants in which the oblique axes are 
located. The angles from the fq axis to the T x and T 2 axes are denoted by a and /?, 
respectively. 



Fig. 13.3.—Derivation of structure values for primary-factor system 


Any variable zj may be represented in this figure by a point P whose coordinates 
(a - i, a j2 ) with respect to the original minres axes are the coefficients in the initial 
pattern equation. The variable may also be construed as a vector from the origin 
to the point P. The angle from the F x axis to this vector is denoted by 0. Then the 
projection of the vector upon the T x axis is given by 

(13.17) D(OM) = D(OP) cos (<j> — a). 

As noted in (13.9), the projection D(OM) is equal to the correlation r jTl , and from 
(13.5) the length of the vector D(OP) is hj. Making these substitutions in (13.17), 
and expanding the cosine of the difference of two angles, this formula becomes 

r jTl = hj (cos (j> cos a + sin <f> sin a), 

= (hj cos (j>) cos a + (hj sin <f>) sin a. 

Now hj cos (f> and hj sin 4> are the projections a jx and a j2 of the vector representing 
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Zj on the F l and F 2 axes, respectively. Then the formula finally becomes 

(13.18) r jTl = cij ! cos a + a j2 sin a. 

In a similar manner it can be shown that the projection D(ON), of the vector 
representing Zj on the T 2 axis, is given by 

(13.19) r jTl = a jl cos /? + a j2 sin /?. 

These results may be summarized in the following matrix form: 


(13.20) 


(*>, r jT 2 ) = (a n 


11 

= cos a 

*12 

21 

= sin a 

*22 


cos /?" 
sin /? _ 


where, in the last matrix, the elements of the first column are the direction cosines 
of T x with respect to F x and F 2 , and those of the second column are the direction 
cosines of T 2 . Although (13.20) was developed on the basis of Figure 13.3, this ex¬ 
pression is true for different positions of the new axes Ti and T 2 , as, for example, 
that indicated in Figure 13.2. 

Written in the form (13.20), the procedure for determining the elements of a 
structure can be generalized to problems involving more than two factors. Thus, 
the transformation from an initial orthogonal factor pattern A to the oblique 
structure S is given by: 

(13.21) S = AT, 

where the transformation matrix T contains in its columns the direction cosines of 
the oblique axes with respect to the orthogonal frame of reference. 

For example the minres pattern matrix of Table 9.3 is multiplied by the trans¬ 
formation matrix in (13.15) to get the structure values of Table 13.1. The correlations 
of the composite variables with the oblique factors are obtained by multiplying the 
reduced pattern matrix of (13.14) by the same transformation matrix. These values 
are presented in the reduced structure of Table 13.1. 

6. Factor pattern.—To complete the solution in terms of correlated factors, the 
linear descriptions of the variables are required as well as their correlations with the 
factors. The pattern coefficients are the coordinates with respect to the oblique axes 
of the points representing the variables and could have been obtained directly* 
from the initial factor pattern. It is more convenient, however, to calculate these 
values after the oblique factor structure has been obtained, making use of the relation¬ 
ships between pattern and structure developed in 2.8. 


* It is possible to obtain the coordinates of the points with respect to the oblique reference 
system directly by transformation of the original coordinates. Since, in factor analysis, both the 
coordinates and the projections (i.e., the coefficients and the correlations) are desired, the present 
approach is suggested as the simplest and the one best adapted to systematic calculations. First 
the projections are obtained, and then in the next stage of the analysis the coordinates are 
calculated. 
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Table 13.1 

Oblique Primary-Factor Solution for Eight Physical Variables 

(Initial solution: Minres, Table 9.3) 


Variable 

i 

Structure: S 

Pattern: P 


r jr 2 

Lankiness 

T 

Stockiness 

t 2 

l 

.914 

.489 

.885 

.059 

2 

.942 

.430 

.960 

-.036 

3 

.904 

.400 

.929 

-.053 

4 

.898 

.458 

.884 

.028 

5 

.453 

.943 

-.007 

.946 

6 

.377 

.800 

-.015 


7 

.313 

.761 

-.074 

.797 

8 

.412 

.696 

.097 

.649 


Reduced Structure 

Reduced Pattern 


.977 

.474 

.977 


u 2 

.456 

.938 

.000 

.938 


If the oblique factor pattern for the illustrative example is denoted by 

(13.22) zj = b n T x + b j2 T 2 {j= 1,2, • • •, 8), 

the problem is to determine the coefficients b n , b j2 . For any variable z jt multiply 
(13.22) by T x and T 2 in turn, sum for the N values, and divide by N. The resulting 
equations are 


(13.23) 


r m = b n + b j2 r TlT2 , 
r jT2 = b n r T2Tl + b j2 . 


There is such a pair of simultaneous equations for determining the two unknowns 
b n , bj 2 for each variable Zj. In equations (13.23) the left-hand members contain the 
known elements of the factor structure. The correlation between the factors also is 
known from paragraph 4 above. Hence, the matrix of coefficients of the unknown 
b’s is 


O = 


~ 1 
_.4858 


.4858" 

1 


and remains the same for all variables. The factor coefficients in this simple case may 
be calculated by the method of determinants; but, especially when many factors 
are involved, more efficient procedures are desired. Such procedures are described 
in 3.4, and applied below (in Table 13.2) to the present example. 

First, the preceding results are extended to any number of variables and factors. 
Corresponding to the equations (13.23) for two oblique factors there is developed 
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Table 13.2 


Calculation of a Pattern from a Structure 


Schematic 

—....- -. .IT- 

Numerical Example 

Original Matrices 

Computed Matrices 

Original Matrices 

Computed Matrices 

Factor Correlations! 

Square Root Matrix 

1.0000 

* 

1.0000 


O 

Q' 





(Assume O = Q'Q) 

(Square root operation Q 1 on O) 

.4858 

1.0000 

.4858 

.8741 

Identity Matrix 

Q 1 

1.0000 

0 

1.0000 

-.5558 

I 

(Square root operation Q” 1 on I) 

0 

1.0000 

0 

1.1440 


O 1 =Q 1 (Q) 1 



1.3089 

-.6358 


(Row-by-row multiplication of Q~ 1 






by itself) 



-.6358 

1.3087 

Factor Structure 

Factor Pattern 

.914 

.489 

.885 

.059 



.942 

.430 

.960 

-.036 



.904 

.400 

.929 

-.053 



.898 

.458 

.884 

.028 

S 

P 

.453 

.943 

-.007 

.946 



.377 

.800 

-.015 

.807 



.313 

.761 

-.074 

.797 


(P = so- 1 ) 

.412 

.696 

.097 

.649 



.977 

.474 

.977 

-.001 


(Row-to-row multiplication of S 

.456 

.938 

.000 

.938 


by O’ 1 ) 






t The square root decomposition of <D into Q'Q should not be confused with (13.16) in which the correla¬ 
tions among the factors are obtained by premultiplying the transformation matrix T by its transpose. 


in 2.8 the matrix relationships between the common-factor portions of an oblique 
pattern and structure. In a notation more suitable to this chapter equation (2.43) 
becomes: 

(13.24) S = P0>, 

which states that the factor structure S is equal to the pattern matrix P postmulti- 
plied by the matrix 0> of factor correlations. Solving this equation explicitly for the 
factor pattern yields: 

(13.25) P = SO -1 . 

While this equation can be employed to get the factor pattern from the known 
structure values and correlations of factors, the work implied in getting the inverse 
of 0> for many factors can be very substantial without the use of high-speed com¬ 
puters. For a small number of factors the work of computing the oblique factor 
pattern can best be done by using the square root method. This is illustrated for the 
simple example in Table 13.2. The resulting values of the factor coefficients are 
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repeated in the pattern matrix in Table 13.1, including those for the composite 
variables. 

An alternative formula can be developed for calculating the oblique pattern, 
involving the initial factor pattern instead of the oblique structure. The structure is 
expressed in terms of the initial pattern in (13.21), and the factor correlation matrix 
is expressed in terms of the transformation matrix in (13.16). Making use of these 
relationships, formula (13.25) then becomes: 

(13.26) P = A(T')' 1 - 

This expression also involves the calculation of the inverse of a matrix of order equal 
to the number of factors, and hence suffers from the same limitations as the preceding 
formula. If one wishes to employ A instead of S for determining P then the work¬ 
sheet of Table 11.2 will be found convenient. 

7. Contributions of oblique factors.—After an oblique factor pattern has been 
obtained, the direct and joint contributions of these factors can be determined. The 
communality of a variable Zj as given by (13.2), may be expressed as follows: 

(13.27) h] = b 2 n + b] 2 + • • • + b] m + 2 b jl b j2 r TlT2 + • ■ • + 2b j>m ^ 1 b jm r TmlTm . 

The terms in the right-hand member of this equation represent the portions of the 
communality of zj ascribable to the respective factors. The direct contributions of 
the factors are given by the first m terms, while the joint contributions of the factors 
are furnished by the remaining terms. 

In 2.4 the total contribution of a factor to the variances of all the variables was 
defined for the case of uncorrelated factors. When the factors are correlated, their 
contributions to the variances of the variables can come about through their inter¬ 
action with other factors as well as through their individual impact. Thus, total 
contributions of oblique factors can be obtained by summing the separate contribu¬ 
tions, as exhibited in (13.27), over all the variables. It is convenient to designate these 
as total direct contributions: 

(13.28) V p =t b J P (P= l,2,---,m), 

j= i 

and as total joint contributions: 

n 

(13.29) V pq = 2r TpTq £ b jp b jq (p, q = 1,2, • • •, m; p < q). 

j= i 

These two sets of expressions can be arranged conveniently in a triangular matrix 
in which the direct contributions are put in the diagonal and the joint contributions 
in the lower part of the triangle. 

In the illustrative example the total direct contributions of the two factors are 
given by Fj = 3.365 and V 2 = 2.611, while the total joint contribution of these factors 
is V 12 = -.021. The grand total of the contributions of a set of oblique factors 
should, of course, be equal to the total communality of the original solution. In the 
present example this total (5.955) agrees within a few points in the last decimal place 
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with the original value from Table 9.3. The total direct contributions of the factors 
account for all but a negligible amount of the total common-factor variance in the 
particular example. 

A striking similarity may be noted between the procedures outlined in this section 
and those in 11 . 3 . The methods are quite similar, the distinction being in the data 
to which the methods are applied. Here the raw data are the coefficients of an initial 
orthogonal factor pattern, while in chapter 11 the raw data are the correlations 
among the variables themselves. 

13 . 4 . Oblique Reference Solution 

An alternative form of oblique solution is presented in this section. In an endeavor 
to satisfy the intuitive principles of simple structure, Thurstone [477, chap. XV] 
devised an ingenious procedure to guarantee a certain number of zeros in the factor 
solution. To this end, he introduced a reference system consisting of the normals 
to the coordinate hyperplanes of the primary-factor solution of the preceding section. 
While this procedure tends to produce the desired number of zeros, it does complicate 
the situation by considering a second reference system. In the remainder of this 
section an exposition is presented of this new “oblique reference system,” and then 
in the following section the distinction and functional relationship between this and 
the “primary-factor system” is developed. 

For a common-factor space of m dimensions there are, of course, m reference axes 
A P {p = 1 , 2 , • • • , m), each of which is normal to a coordinate hyperplane n p (of 
m — 1 dimensions) as defined in 4 . 3 . In ordinary space, the hyperplanes are actual 
planes, and the three reference axes are normal to the three coordinate planes. For a 
common-factor space of only two dimensions, the hyperplane becomes a line, and 
its normal is another line at right-angles to it. Specifically for the example of the 
preceding section, the “hyperplane” n x in the T x T 2 -plane is the space with the 7] 
axis missing—in other words, the T 2 axis; and similarly the n 2 hyperplane is the T x 
axis. Then, the reference axis is perpendicular to n x ( = T 2 ) and A 2 is perpendicular 
t 0 7T 2 ( = T x ), as indicated in Figure 13.4. It is evident from this figure that the four points 
hovering close to T x have projections on A 2 very close to zero. Similarly, the variables 
close to T 2 have near zero projections on A t . 

Without drawing a three-dimensional diagram, three oblique coordinate planes 
%, n 2 , and 7 c 3 can be imagined. The normals to these planes are the reference axes 
A l5 A 2 , and A 3 , respectively. For each of these axes, every point in the plane to which 
it is orthogonal will have zero projection upon it. This will be true whether the 
points in the coordinate plane cluster around the intersections of these planes 
(Ti, T 2 , and T 3 ) or fan out in these planes. 

The generalization of these properties to any m-space is immediate. All points lying 
in, or close to, a hyperplane n p will have near zero projections on the normal A p to 
this hyperplane. It is probably this property—which guarantees the first criterion for 
simple structure (see 6 . 2 )— that led Thurstone to the choice of the reference axes. 
Also, by requiring that each hyperplane be “over deter mined,” i.e., contain at least m 
points, the second criterion for simple structure is assured. Thus it can be seen why 
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Fig. 13.4 

Thurstone developed the oblique solution on the basis of reference axes which would 
be normal to geometric spaces, so that every point in such a space would have a zero 
projection at least on the axis at right angles to it. Concentrating on the zero projec¬ 
tions, Thurstone plays down the factor pattern associated with the reference axes. 
For heuristic reasons, the complete oblique solution in terms of such reference axes 
will be developed here, including the factor pattern as well as the factor structure. 

Just as in the preceding section, the oblique solution in terms of the new reference 
axes is obtained by rotation of some initial orthogonal solution. Conventionally, the 
transformation in this case is designated by A and the resulting oblique factor struc¬ 
ture by V, while in this text A is employed for the initial matrix of coefficients of the 
orthogonal factors rather than F which might be confused with the factors themselves. 
Then in place of (13.21) there is the following equation: 

(13.30) V = AA. 
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Again, as in the transformation matrix T of (13.21), the matrix A contains in its 
columns the direction cosines of the oblique axes (A p ) with respect to the orthogonal 
frame of reference. 

Knowing the previous transformation matrix T it is possible to determine the 
desired matrix A without resorting to graphical methods. The elements of T are the 
direction cosines of the T -axes (passing through the clusters of variables or determined 
by the intersections of the coordinate hyperplanes containing a spread of the variables). 
These axes may be viewed as additional vectors (like the variables) in the original 
orthogonal reference frame. In this manner they may be considered as extensions of 
the factor pattern A, just as the reduced pattern in the preceding section. Since the 
matrix T was written with the direction cosines in columns, it is necessary to take 
the transpose of T in the extension of A. Then applying the transformation (13.30) to 
this continuation of A, there results 

(13.31) T'A = D, 

which is a matrix of the scalar products or correlations among vectors T p and A p 
ip = 1, 2, • • •, m). Since A p is normal to the hyperplane 

n p = OT x T 2 --.)T p {---T m , 

it is uncorrelated with every T-axis except T p . Hence, the diagonal values of D are 
the correlations between corresponding A and T factors, while all values off the 
diagonal are zero, i.e., D is a diagonal matrix. 

From the relationship (13.31), the explicit expression for the transformation matrix 
A becomes : 

(13.32) A = (T') -1 D. 

From the previous knowledge of T, the inverse of its transpose can be computed. 
However, this computation may be rather laborious because T is not a symmetric 
matrix. From the relationship (13.16), the inverse of T can be obtained when the 
inverse of the symmetric matrix 0> is known. Taking the inverse of both sides of 
(13.16) produces 

(13.33) 0> _1 = t _1 (T') _1 

and then premultiplying both sides by T produces the desired result : 

(13.34) (T') _1 = TO* 1 . 

Then A is obtained by normalizing the columns of this matrix, i.e., dividing each 
element in a column by the square-root of the sum of the squares of all the: elements 
in that column. 

After the transformation matrix A is determined, the initial orthogonal factor 
pattern is postmultiplied by it to obtain the new factor structure V, according to 
(13.30). The intercorrelations of the reference factors A p can be obtained by the 
following matrix multiplication: 

(13.35) A'A = ¥, 
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where the resulting matrix is designated ¥ to distinguish it from the matrix O of 
correlations among the factors T. Then, to complete this oblique solution, the factor 
pattern W in terms of the reference axes A p is computed by either of the formulas: 

(13.36) W = VT" 1 
which corresponds to (13.25), or 

(13.37) W = A(A')" 1 , 


which corresponds to (13.26). 

To illustrate the foregoing, the example of eight physical variables again is em¬ 
ployed. First, the transformation matrix A is required, and it is obtained by use of 
formula (13.32). The (T')“ 1 for this formula is calculated by means of (13.34) using the 
values for T and O ~ 1 from the preceding section, producing 


(Tr 1 


“ .6879 .4649'. 

.9142 1.0452.' 


Then, normalizing by columns produces: 

r .6012 .40641 

A = 

L— .7991 .9137 J 

as the transformation matrix to the new reference axes. 

The new factor structure V is obtained by multiplying the original minres pattern 
of Table 9.3, and the reduced pattern matrix (13.14), by this transformation matrix, 
and the results are shown in Table 13.3. The new factor pattern W is calculated by 

Table 13.3 

Oblique Reference Solution for Eight Physical Variables 

(Initial solution: Minres, Table 9.3) 


Variable 

j 

Structure: V 

Pattern: W 

1'A, 

lA 2 


^2 

1 

.774 

.052 

1.046 


2 

.839 

-.032 

1.078 


3 

.813 

-.045 

1.034 


4 

.773 

.025 

1.027 


5 


.827 

.518 


6 


.706 

.431 


7 


.697 

.358 

.871 

8 


.567 

.471 

.796 


Reduced Structure 

Reduced Pattern 

U ! 

.854 

-.000 

1.118 


u 2 

-.000 

.820 

.522 

WtiUm 
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(13.36), using the algorithm of Table 13.2 with ¥ in place of <I>, and V in place of S. 
The resulting pattern W is given in Table 13.3. The correlation between A* and A 2 , 
obtained from the general expression (13.35) is simply the sum of the two cross- 
products of the direction cosines in A, namely — .4858.* 

It may also be of some interest to compute the diagonal matrix D 


D = T A = 


f.8741 


L 0 


0 ~ 
.8741J’ 


giving the correlations between corresponding T and A factors. For the simple case 
of only two factors, each of these correlations is .8741. 


13.5. Relationship between Two Types of Oblique Solutions 

In the preceding two sections, the several parts that make up an oblique multiple- 
factor solution were outlined very explicitly for the set of T-axes and the set of A-axes. 
The T-factors of 13.3 are called primary factors by Thurstone, and the A-factors of 
13.4 are merely called reference axes. He uses the structure of the latter system as an 
indication of the pattern of the former system, and thereby identifies the primary 
factors. 

The reference axes lie in the common-factor space and are said to be bi-orthogonal 
to the primary factors. Certainly the reference factors of 13.4 would seem more like 
mathematical abstractions than the primary factors of 13.3. The only reason for the 
strong interest of the Thurstone school in the simple reference structure V is its 
similarity to the primary-factor pattern, which is shown in (13.43) below' The bi- 
orthogonal system of coordinate axes may have been an ingenious idea in the 1930’s, 
but certainly is not necessary today. This dual system—reference and primary—is 
presented for its historical and traditional interest; the direct approach to primary 
solutions (by hand methods in chapter 11 and in 13.3, and by computer means in 
15.5) is strongly recommended. 

The attempt made in the preceding sections was to clearly define and distinguish 
the two sets of oblique axes. It was recognized that the oblique solution is dependent 
upon an initial orthogonal pattern from which clustering of variables might be 
discerned, or else some graphical means might be necessary to locate the axes. Then 
the primary factors T p were passed through these clusters of variables, and the correla¬ 
tions among the factors were determined. Specifically, the analysis of the variables in 
terms of these oblique factors was shown both by the correlations with the factors 
(structure S) and by the coefficients in the linear expressions in terms of the factors 
(pattern P). Similarly, when the analysis was made in terms of the reference axes A p , 
both the structure V and the pattern W were displayed. It would have been quite 
sufficient to have a single oblique solution, consisting of the structure S and the 

* It will be noted that this correlation is simply the negative of the correlation between the 
factors T-t and T 2 . From Figure 13.4 it can be seen that cos (angle between A x and A 2 ) = cos 
(180° — 6 12 ) = —cos 0 12 , so that the correlation of one set of axes is the negative of that between 
the other set. 
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pattern P. However, since the structure V has become so popular it seems important 
to bring into clear perspective the meaning of the apparently alternative oblique 
solutions. To assist in the ensuing discussion, the notation employed in the oblique 
solutions is summarized in Table 13.4. 


Table 13.4 

Notation Employed in Bi-orthogonal Oblique Solutions 




Initial 






Factor 

Orthog- 

Transformation 

Factor 

Oblique Solution 

Type of Solution 

Designation 

onal 

Matrix 

Correia- 





Pattern 


tions 

Structure 

Pattern 

Sec. 13.3 

Tp (Primary) 

■■ 

T = (t qp ) 

O 

S 

P 

Sec. 13.4 

A p (Reference) 

■■ 

A = (X qp ) 

¥ 

V 

W 


Unfortunately, the explicit designations of structure values and pattern values 
have not always been made in connection with oblique solutions, with a resulting 
state of confusion. The Thurstone school of factor analysis frequently refers to “the 
factor matrix V,” implying therein the complete factor solution. This is an extremely 
ambiguous statement when the factors are correlated—there is no unique, single 
matrix! Is it the matrix of factor coefficients, or the matrix of test correlations with 
the factors? Both uses have been made of the term “factor matrix” in an oblique 
solution. The more common meaning of “the factor matrix V” is that defined by 
Thurstone [477, p. 347] as the matrix containing the projections of the test vectors 
on a set of oblique reference axes A p , which meaning is carried in the present text. 

Along with the ambiguous use of a single matrix to describe the two distinct com¬ 
ponents of an oblique solution is the equally ambiguous use of the term factor 
“loading.” While this may be perfectly acceptable in the case of an orthogonal solu¬ 
tion, it lacks precision of meaning in an oblique solution. Factor “loading” is not a 
mathematical or statistical term meaning either “correlation” or “coefficient,” and 
hence has been used inconsistently in both senses. Even when the term is given explicit 
meaning in an oblique solution it can never take the place of such well-defined 
concepts as correlation and coefficient. Fruchter [137, p. 193] refers to the “loading” 
of a variable both for the correlation with a reference axis and for the coefficient of a 
primary factor, probably because of the relationship between the structure of the one 
solution and the pattern of the other, which is indicated below in (13.43). 

The confusion is not simple-minded but results from some rather subtle considera¬ 
tions. The fact of the matter is that a mathematical relationship exists between the 
structure V and the pattern P (as well as between S and W). Hence, in some sense it is 
unnecessary to introduce P, because all of its properties can be inferred from V. 
While this is true, it certainly complicates the interpretation of the primary-factor 
pattern to have to rely on the reference-factor structure for an indication of its values, 
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OBLIQUE MULTIPLE-FACTOR SOLUTIONS 13.5 


i.e., to have to look at the structure in Table 13.3 to infer the properties of the pattern 
in Table 13.1. Now the exact mathematical relationships will be developed. 

From (13.26), the initial orthogonal factor pattern can be expressed in terms of the 
primary pattern, 

(13.38) A - PT', 

and from (13.30), it can also be expressed in terms of the reference structure, 

(13.39) A = VA~*. 

Equating these two expressions yields: 

(13.40) PT' = VA -1 . 

But, from (13.32), 

(13.41) A -1 = D _1 T', 

so that (13.40) becomes: 

(13.42) PT' = VD -1 T'. 

Finally, the relationship between P and V is given by: 

(13.43) P = VD _1 or V = PD. 

Thus, for a given reference structure matrix and correlations between the correspond¬ 
ing primary and reference factors, the primary pattern matrix is determined. 

In a similar fashion, the initial orthogonal factor pattern can be expressed in terms 
of the primary structure from (13.21), and also in terms of the reference pattern from 
(13.37). Then the same kind of mathematical analysis, employing the relationship 
(13.32) between the two transformation matrices, yields: 

(13.44) S = WD or W = SD 1 

This relationship, like (13.43), indicates that the structure of one type of oblique 
solution is rather simply related to the pattern of the other solution, and vice versa. 

While (13.43) and (13.44) establish the exact relationships across the two types of 
oblique solutions, they do not replace the need for the clear distinction between the 
structure and pattern for either type of oblique solution. Of course, the structure and 
pattern have unique meanings, and serve rather distinct purposes, and it is in this 
sense that they complement each other in providing complete understanding of an 
oblique solution. In general, for a set of positively correlated primary factors, the 
factor structure will contain all positive entries, while the factor pattern will have 
high positive values and many values near zero (see Table 13.1). The primary-factor 
structure is useful in the estimation of factors (see chapter 16). On the other hand, it 
does not provide a very good indication of “saturation” of the variables with the 
factors. The primary-factor pattern gives this precisely, and thereby is most useful 
for identification of the factors. 
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By way of illustrating these points, consider variable 1 (Height) in the example of 
Table 13.1. From the first line of the factor structure, its correlation with the factor 
T x (Lankiness) is found to be .914 and its correlation with the factor T 2 (Stockiness), 
.489. However, its “saturation” with these factors can best be obtained from the 
linear equation: 

Zl = .885 Ti + .059 T 2 , 

which comes from the first line of the factor pattern. The direct contributions of factors 
T x and T 2 to the variance of variable 1 are (.885) 2 = .783 and (.059) 2 = .003, respec¬ 
tively, while 2(.885)(.059)(.4858) = .051 is attributable to the joint influence of the 
two factors. In other words, 78.3 per cent of the Height variable is attributable to the 
Lankiness factor, while only 0.3 per cent is attributable to the Stockiness factor and 
5.1 per cent to the joint influence of these factors (leaving 15.3 per cent of the total 
variance of variable 1 unaccounted by the common factors). 

Similar interpretations could be made of the values in Table 13.3. But actually, 
nobody proposes the reference axes as useful factors. The Thurstone school would 
exhibit the “factor matrix V” for purposes of identifying the factors. To this end, it is 
just as effective as the pattern matrix P, although from (13.43) it can be seen that the 
values in P are larger than those in V because each correlation in the diagonal matrix 
D is less than unity. However, for overall clarity and understanding, the complete 
primary-factor solution of Section 13.3 is recommended. 




14 

Analytical Methods for the Multiple-Factor 
Solution: Orthogonal Case 


14.1. Introduction 

The last two chapters were devoted to methods of transforming some initial factor 
solution to another “preferred” type of solution. The multiple-factor solutions, as 
developed in these chapters, are dependent in large measure on subjective judgments, 
and the transformations to them cannot be written explicitly from the qualitative 
conditions for “simple structure.” Many attempts have been made to reduce the 
principles of simple structure to an objective form, and hence to develop an objective 
procedure for calculating a (simple structure) multiple-factor solution. These attempts 
date back to the 1930’s, but it was not until 1953 that a real breakthrough was accom¬ 
plished by Carroll [69]. 

One might wonder if objective procedures are really necessary when some of the 
solutions in chapter 13 (and also in chapter 11) seem so elegant even with the crude 
subjective procedures. These appear as choice solutions only because there were 
clear-cut clustering of variables into groups. Unfortunately, not all variables lend 
themselves to ready grouping, no matter how well an experiment is designed. Further¬ 
more, a scientific methodology cannot be dependent on subjective operations. Now 
sound, objective, efficient procedures are available for determining a multiple- 
factor solution for any set of data. Analytical methods for transforming any initial 
solution to a simple-structure solution are presented in this chapter for the case of 
orthogonal factors and in the following chapter for the case of oblique factors. 

Before developing the objective procedures a brief discussion of some semi- 
analytical methods is presented in 14.2. This is done in order to provide some his¬ 
torical perspective to the problem of finding an analytical solution. Also in this sec¬ 
tion is developed the fundamental rationale for the analytical methods. In 14.3 and 
14.4 specific procedures are presented which lead to orthogonal simple-structure 
solutions. Essentially the same basis—the quartimax criterion—was developed 
independently by four researchers, and the orthogonal solution derived therefrom is 
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presented in 14.3 both in theory and with numerical illustrations. An alternative 
approach to orthogonal simple structure is developed in 14.4, employing Kaiser’s 
varimax criterion. This procedure not only does a better job of approximating the 
classical simple-structure principles, but it also tends to lead to factorially invariant 
solutions. Again, numerical examples are given to illustrate the methods. 

While the breakthrough in the analytical rotational methods came in 1953, the 
new procedures could not immediately be used because of the laborious computa¬ 
tions involved. By 1958, however, electronic computer programs were developed 
which made their application feasible. In presenting the analytical methods, considera¬ 
tion was given to separating the theory from the computing outlines, but it was 
decided to be somewhat inappropriate since the practical use of these methods 
(beyond two or three factors) is only feasible with high-speed electronic computers. 

14.2. Rationale for Analytical Methods 

As noted in chapter 12, many ingenious graphical and mechanical procedures 
were developed for transforming “arbitrary” factor matrices into “meaningful” 
factor matrices satisfying the simple structure principles of 6.2. So long as the criteria 
for simple structure were stated in qualitative terms, rather than precise mathematical 
terms, it was to be expected that the quest for a simple structure solution could only 
lead to a subjective result. To get results that are independent of the particular 
investigator it would be necessary to rephrase the conditions for simple structure. 
That is precisely what has been taking place ever since the principles were first 
enunciated, although done indirectly or implicitly more often than by explicit attack 
on the principles themselves. 

From the very beginning of the application of simple-structure principles, it was 
recognized that the procedure was more of an art than a science. In an endeavour to 
put the rotations on a more objective basis, the first improvements were directed 
toward eliminating graphical procedures. Nonetheless, arbitrary decisions were still 
required to determine “significant” factor loadings, “large” or “near zero” factor 
loadings, “subgroups” of variables, and the like. Of the many such semi-analytical 
solutions proposed, Horst’s [251] was among the first. He follows Thurstone’s early 
principle for simple structure, namely, that each column of the factor matrix should 
have a minimum number of negative values and a maximum number of nearly vanish¬ 
ing values. He then expresses this condition in analytical terms, as follows [251, 
p. 80]: “For a given factor the sum of the squares of significant factor loadings divided 
by the sum of the squares of all the loadings shall be a maximum.” This procedure 
lacks complete objectivity in that a subgroup of variables with “significant” factor 
loadings has to be selected, and since such variables cannot be determined on 
statistical grounds an arbitrary operational definition is introduced. 

Employing Horst’s criterion as a point of departure, Tucker [489] presents a com¬ 
promise procedure employing both analytical and graphical methods. In this method 
the positions of the trial reference axes for subgroups of variables are determined by 
an analytical method, while the subgroups themselves are selected subjectively 
employing the inter-factor graphs as guides. 



MULTIPLE-FACTOR SOLUTION: ORTHOGONAL CASE 14.2 


en years later, Thurstone [478] proposed another type of near-objective proce¬ 
dure involving the minimization of a weighted sum of projections of the test vectors 
on a reference vector. The selection of the weights is on a rather arbitrary basis 
designed to emphasize near zero projections. The method involves simple computa¬ 
tions and yields results closely approximating the more intuitive graphical methods. 

s noted above, the first truly analytical rotation criterion for determining psycho¬ 
logically mterpretable factors was developed by Carroll [69]. He considered Thur¬ 
stone s five principles of simple structure (see 6.2) but immediately ruled out the likeli- 
ood of a single mathematical expression embodying all these characteristics This 
conscious departure was aptly stated by Kaiser [293, p. 188] as “the first attempt to 
break away from an inflexible devotion to Thurstone’s ambiguous, arbitrary, and 
mathematically unmanageable qualitative rules for his intuitively compelling notion 
of simple structure.” . 


In a similar attempt to objectify the definition of simple structure, Tucker [493] 
proposes a list of ten requirements to be satisfied by any such criteria. While his list 
provides another indication of the necessity to depart from Thurstone’s qualitative 
ru es, it also is dependent on subjective judgment in several places. On the basis of 
the objective definition of simple structure which he proposes, Tucker develops a 
method for the isolation of m “linear constellations” each with dimensionality 
(m ) when the common-factor space is of m dimensions. His procedure yields 
satisfactory results primarily for “those well-designed studies in which the vectors 
are concentrated along all hyperplanes” [493, p. 224]. Since the emphasis in this 
chapter is on analytical procedures which completely avoid subjective decisions 

and are applicable to any initial factor solution, Tucker’s rotational method will not 
be treated further. 

Several researchers,* working independently, almost simultaneously arrived at 

WhL S r ^ S f l Utl ° nS for objectifying the rotational problem in factor analysis. 

hile three of the investigators apparently were led to their solutions by a rationale 
which they considered to be based on Thurstone’s rules, Ferguson attempted “to 
develop a logical groundwork which would lead ultimately to an objective analytical 
solution and render explicit the meaning of such a solution” [126, p. 288]. His develoo- 
ment centers around the concept of parsimony in factor analysis, including its 
philosophical kinship to the term as used in other scientific theories. This concent 
with its widespread application in all aspects of the subject, provides the very founda¬ 
tion of factor analysis—from a simple definition of its objective to the complex 
considerations m the rotational problem. 


Parsimony is one of the fundamental standards in selecting a preferred solution 
out of the mfimtude of possible solutions, and is the basis of the simple-structure 
principles. While the notion of parsimony is implicit in much of the work in factor 
analysis, it does not always carry an explicit meaning. In regard to the number of 
actors, it is perfectly clear what is meant by parsimony. On the other hand, no such 
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precise meaning attaches to the notion of parsimony in the rotational problem. It 
was the full realization of this fact that led Ferguson to his rationale for an analytical 
solution to the rotational problem: 

“Since factorists in dealing with the rotational problem employ an intuitive concept of 
parsimony, the question can be raised as to whether the term can be assigned a precise and 
more explicit meaning, which will enable the rotational problem to be more clearly stated 
and admit the possibility of a unique and objective solution. This involves explication ... 
[which he attributes to Carnap]... the process of assigning to a vague, ^ ll-defined, and 
perhaps largely intuitive concept a precise, explicit and formal meaning ... 1126, p. 282J. 

Now, Thurstone’s principles of simple structure were devised as an explicit expres¬ 
sion of parsimony in factor analysis, but they do not provide explication in the above 
sense. Ferguson points out three shortcomings: (1) The conditions for simple struc¬ 
ture cannot be represented as terms in a mathematical expression capable of manipula 
tion; (2) the conditions are discrete rather than continuous; and (3) because of the 
discrete formulations in the simple-structure concept it is insufficiently general. He 
therefore proposes an explication of the concept of parsimony so that a unique 
solution to the rotational problem would result. Of course, the extent of agreement 
between solutions obtained by intuitive-graphical methods and by such a precise 
objective method would depend on the nature of the explication. 

A measure of parsimony in the rotational problem can be defined in terms of the 
degree to which the configuration of vectors representing the variables is structured, 
i.e., the position of the configuration on an hypothetical continuum of all possible 
configurations, from the completely chaotic to the ideal configuration in which each 
variable is of unit complexity. As pointed out in 6.2, the configuration of vectors 
serves as the vehicle to get to a particular set of reference axes. In factor analysis the 
structural properties of the configuration are conveyed by the frame of reference. 

While there are many approaches to the problem of assigning a precise mathemat¬ 
ical meaning to a measure of parsimony, Ferguson suggests a simple and attractive 
one. Starting with a single variable, represented by a point, what is the most parsi¬ 
monious description of it, upon rotation of a pair of orthogonal axes? He suggests 
that intuitively the most parsimonious description results when one of the axes passes 
through the point. Approaching this ideal, it can be seen that when the reference 
frame is rotated so that one of the axes approaches the point, the product of the two 
coordinates grows smaller. Continuing this line of reasoning, Ferguson suggests 
that some function of the sum of products of coordinates of a set of collinear points 
might be used as a measure of the amount of parsimony associated with the descrip¬ 
tion of these points. Finally, for the usual situation of positive and negative co¬ 
ordinates, he proposes (in place of the simple sum) the sum of squares of products of 
coordinates as a measure of parsimony. This measure, for the case of n variables and 
m orthogonal factors, is 

n m 

(14.1) XX ( a jp a j «) ’ 

j— 1 p<q~ 1 

involving m(vn — 1 )/2 sums of n pairs of coordinates. 
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The expression (14.1) turns out to be closely related to several of the analytical 
procedures that were developed independently. It will be convenient to employ the 
following notation for the general rotational problem in terms of orthogonal factors: 



A = (a jp ), initial factor matrix, 

(14.2) 

B = ( b jp ), final factor matrix, 

T = ( t qp ), orthogonal transformation matrix, 

so that 


(14.3) 

B = AT. 


When A is carried into B by the orthogonal transformation T, the communality 
of any variable is invariant, i.e., 

m m 

(14.4) £ b% = £ a) p = h) (j = 1,2, • • •, n). 

p =i p= i 

The squared communality of any variable also remains constant, namely, 

I m \2 m m 

£ b %) = £ b % + 2 £ b jp b j q = constant. 

P=1 / P =1 P<9=1 

Then, summing over the n variables produces 

n m Km 

( 14 - 6 ) £ £ b j P + 2 £ £ b )A = constant. 

j= 1 p= 1 j= 1 p<q= 1 

Since the sum of the two terms in this expression must always be the same, it follows 
that when one of these terms increases the other must decrease, and vice versa. Hence, 
a transformation of A which maximizes one of the terms in (14.6) will, at the same 
time, minimize the other. Formula (14.6) provides the relationship between two 
independent approaches to the rotational problem, as will be indicated below. 

Either term of (14.6), or some function of these terms, could serve as a precise 
mathematical measure of parsimony. Actually, minimization of the second term for 
maximum parsimony is implied in (14.1). On the other hand, Ferguson [126, p. 286] 
suggests that 

k m 

(14-7) Q = £ £ b% 

j= i p= i 

be maximized for maximum parsimony of a factor structure. Of course the value of 
Q depends upon the factor loadings, which vary with the particular positions of the 
reference axes. The most parsimonious solution, in the least-square sense, would seem 
to require that rotation of the frame of reference which makes the value of Q a 
maximum for a given set of data. The theoretical limit is attained when the complexity 
of each variable is unity. This might be said to constitute the maximum degree of 
structure or organization possible for a configuration of variables. 
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14.3. Quartimax Method 

Without employing the “parsimony rationale” explicitly, and each starting with 
an independent approach, Carroll [69], Neuhaus and Wrigley [379], and Saunders 
[416] arrived at criteria for objective analytical solutions closely related to the 
maximization of the function (14.7). In one way or another, each viewed the rotation 
of axes in order to arrive at simple structure as an attempt to reduce the complexity 
of the factorial description of the variables. The ultimate objective would be a uni¬ 
factor solution, in which each variable would be of complexity one, i.e., involve 
only a single common factor. An orthogonal uni-factor solution is extremely unlikely 
with empirical data (except for the limiting case of only a general factor for the entire 
set of variables, or the case of several mutually uncorrelated group factors as implied 
by a set of correlations as in Figure 6.1). 

If a uni-factor solution were possible, the variance of each variable would result 
from but one factor loading; and a reasonable approach to this ideal would seem to 
require the maximum inequality in the distribution of the variance among the several 
factors for each variable in the factor pattern. In other words, the transformation 
desired is one which will tend to increase the large factor loadings and decrease the 
small ones for each variable of the original factor matrix. In this attempt to increase 
the inequalities among the factor loadings, the size implied is independent of algebraic 
sign. Since absolute values are somewhat awkward for mathematical manipulation, 
Neuhaus and Wrigley [379] propose that the inequalities among squares of factor 
loadings be maximized; or, more specifically, that the variance in the distribution 
of squared factor loadings should be made a maximum by the use of orthogonal 
transformations. Since this approach involves the maximization of fourth powers of 
factor loadings, Burt (in [379]) has suggested the term “quartimax” for this method. 
In this text the term “quartimax method” is used collectively for the several independ¬ 
ent derivations of analytical procedures related to the maximization of the function 
(14.7). 

In the notation of (14.2), the object of the quartimax method is to determine the 
orthogonal transformation T which will carry the original factor matrix A into a 
new factor matrix B for which the variance of squared factor loadings is a maximum. 
From the basic definition (2.4) the variance of the contributions (the squared factor 
loadings) of all m factors to the n variables is simply: 

1 n m 

(M.8) sf 2 = — £ Y.K- F) 2 

mn j 

where the mean of all the squared factor loadings is 

i n m 

(14 - 9) 

mn j= i p=i 

Upon expansion and simplification, the expression (14.8) reduces to 

i n m 

(14.10) M=— £ £ b%-(P)\ 

mu j= i p=1 
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MULTIPLE-FACTOR SOLUTION: ORTHOGONAL CASE 14.3 


which Neuhaus and Wrigley [379] designated the quartimax criterion that is to be 
maximized. Now, since ti 1 remains constant under orthogonal transformation accord¬ 
ing to (14.4), and constant terms have no effect on the maximizing process, this 
criterion is equivalent to the measure of parsimony (14.7), i.e., simply the maximiza¬ 
tion of the sum of fourth powers of factor loadings. 

In Carroll’s [69] original development of an objective procedure he focused atten¬ 
tion on criteria 3, 4, and 5 of the simple structure principles (see 6.2). This led him to 
consider the minimization of some sort of inner-product function of the columns of 
the final factor-structure matrix. The criterion which he proposes is that 

m n 

(14.11) N = £ £ b%b% 

P<q —1 j= 1 

be a minimum. It should be noted that, for the orthogonal case, the expression (14.11) 
is identical with the measure of parsimony (14.1) proposed by Ferguson. Also for the 
orthogonal case, minimizing (14.11) is equivalent to maximizing (14.10) according to 
(14.6). In other words, the criterion N = minimum will lead to precisely the same 
results as Q = maximum when the rotated factors are orthogonal. However, Carroll’s 
criterion (14.11) is not restricted to the orthogonal case. As a matter of fact, its applica¬ 
tion generally will lead to oblique factors (see 15.3). 

Again, from a fresh point of view, Saunders [416, p. 5] attempts to objectify the 
rotation to simple structure by maximizing “the proportion of small and large load¬ 
ings, at the expense of medium-sized ones.” Before considering any function of the 
factor loadings, he notes that the direction of scoring a test is irrelevant to simple 
structure, so that the algebraic signs in any column of the factor matrix may be 
changed without affecting the criterion. In order to deal with this sign ambiguity, 
and at the same time preserve the sign relationships of loadings for the same variable, 
he suggests that each variable be considered twice (as originally scored and also 
reflected) in the frequency distribution of factor loadings. Thus, the “doubled” 
distribution of factor loadings is perfectly symmetric and always has a mean of zero. 
Then Saunders proposes as a criterion for a simple structure solution that the kurtosis 
of the “doubled” frequency distribution of rotated factor loadings be a maximum. 
The fourth moment and second moment are easily expressed about the mean of zero, 
so that the kurtosis to be maximized is:* 

(14-12) * = I Z K \ Z I b%\ . 

J= 1P=1 / \j=lp=l I 

It will be recalled that kurtosis is a measure of the flatness or peakedness of a single¬ 
humped distribution. For the present application the special significance of this 
measure is that as kurtosis is increased the relative frequencies of the middle (near 

* Since the resulting function is to be maximized, the numerical constants that arise from the 
number of observations and from consideration of the “doubled” frequency may be disregarded. 
Thus, while the notion of the “doubled” frequency was necessary to the rationale of the method, 
the factor loadings actually need not be reflected. 
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zero loadings) and tails (large loadings) of a distribution are increased at the expense 
of the intermediate regions. The denominator of (14.12) remains constant under 
orthogonal transformation according to (14.4), so that this criterion is equivalent to 
each of the preceding in the orthogonal case. 

Since the four criteria (Q, M, N, and K) lead to identical results for an orthogonal 
solution, the theory in this section will be developed simply for the maximization of 
the sum of fourth powers of factor loadings ( Q ), following the procedure of Neuhaus 
and Wrigley [379]. The notation of (14.2) is employed, with fr’s representing all 
intermediate values resulting from rotation of the a’s as well as the final factor load¬ 
ings. For any variable Zj, the orthogonal transformation in the plane of factors p 
and q through an angle (p will carry the original coordinates (a’s) into the new co¬ 
ordinates (6’s), as follows: 


(14.13) 


b jp — a jp cos (p + a jq sin (p 
b jq = — a jp sin cp + a jq cos (p 


according to the basic equations of transformation in a plane (12.13). In order to 
measure the effect of such a transformation on the overall criterion Q , the sum of 
fourth powers of the new loadings of the two rotated factors is determined. This sum 
is defined by 


(14.14) 


n 


= I K + b%), 

j= l 


which depends on the parameter cp . The object is to determine the angle of rotation 
{(p pq in precise notation) for any pair of factors p and q which will make the sum Q pq 
a maximum. Then the product of the transformations of all combinations of pairs of 
factors produces a transformed factor matrix: 

04.15) B = AT 12 T 13 • • • r V pq • • • T (m _ 1)jBf , 

where p = 1, 2, • • •, (m - 1), and the associated q = p + 1, p + 2, • • •, m. The com¬ 
plete set of m(m — l)/2 pairings of p and q is called a cycle. In each cycle of opera¬ 
tions the value of Q for the entire matrix is as large as or larger than the preceding 
sum. Since the theoretical maximum of fourth powers of factor loadings cannot 
exceed n (even when the total unit variance of each variable is analyzed), the procedure 
must converge after a sufficient number of cycles. 

For any rotation T pq the angle (p which will make Q pq a maximum can be determined 
as follows: (a) substitute the expressions for the b ’s from (14.13) into (14.14); (b) dif¬ 
ferentiate (14.14) with respect to (p; (c) set the derivative equal to zero; and (d) solve 
the equation for (p . The results [379, p. 83] can be put in the form: 


n 

2 Z ( 2a j P a jq )(aj P - a jq) 

(14.16) tan 4 (p = n J ~ x -= - 

Z Wjp - a %) 2 - (2 a jp a jq ) 2 ] 

J= i 
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When (14.14) is expanded and the terms involving <p are collected and simplified 
they are found to involve sin 4 cp and sin 2 2cp. Since each of these terms has the period 
n/2 so does Q pq ((p). Therefore the solution (14.16) need only be considered for cp 
between 0° and 90°. Actually, experience has shown that subsequent reflection of 
lactors can be reduced by requiring (p to be between -45° and +45° 

While any solution (14.16) yields a critical value of <*>, necessary for a maximum 
value of Q n , such a critical value may produce a minimum, or stationary value of 
Q„ as well. A sufficient condition for a maximum is a negative value of the second 
derivative of the function when the critical value is substituted in it. The conditions 
for a maximum can be summarized in the form: 


(14.17) 


dQpq 

d(p 

fQii 

d(p 2 


from which it follows that 


v cos 4(p - 3 sin 4cp = 0 
— 3 cos 4 (p — v sin 4(p < 0 


(14.18) 


3 2 + v : 


sin 4 (p < 0. 


Actually, the formal work of computing the second derivative can be obviated 
since the angle which produces a maximum can be determined from the algebraic 
signs of the numerator and denominator of (14.16). In the expression (14 18) the 
numerator is always a positive number, so the algebraic sign of the entire expression 
is determined completely from the simpler form: 


(14.19) 


-sin4(p > 0. 
v 


It follows that the numerator of (14.16) and sin 4<p must have the same algebraic signs 
if the condition (14.19) for a maximum is to be preserved. Corresponding to each 


Table 14.1 


Angle of Rotation 


Algebraic signs in (14.16) 

Sign of 
cos 4 <p 

Resulting Quadrant of 4(p 

Limits for (p 

Numer¬ 
ator (v) 
(and 
sin 4 (p) 

Denom¬ 

inator 

(<5) 

tan 4(p 

+ 

+ 

+ 

+ 

I: 

0 < 4(p < 90° 

0° to 22.5° 




— 

II: 

90° < 4q> < 180° 

22.5° to 45° 



+ 


III: 

-180° < 4q> < -90° 

-45° to -22.5° 


+ 


+ 

IV: 

— 90° < 4(p < 0° 

-22.5° to 0° 


i- 
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algebraic sign of the numerator, the denominator may be either positive or negative. 
The resulting four possibilities, with the angle of rotation associated with each, are 
set forth in Table 14.1. 

The foregoing theory provides the basis for computation of the quartimax method. 
First tan 4(p is calculated from (14.16) and then, from the algebraic signs of v and <5, 
the angle of rotation (p is determined from Table 14.1. Then the matrix of the original 
(or last computed) pair of columns p and q of the factor matrix is postmultiplied by 
the transformation matrix 


(14.20) 


■pq 


COS (p 

sin (p 


— sm q>\ 
cos (Pi 


where (p is understood to be the angle of rotation (p m in the plane of the factors p 
and q. The result leads to the maximum sum Q m of fourth powers of the rotated 
factor loadings for factors p and q. After rotation of all combinations of factors, the 
transformation to the final factor matrix B is accomplished as indicated symbolically 
in (14.15). The cycle of operations on all pairings of factors must be repeated as many 
times as necessary to assure that the Q, for the full matrix, no longer increases (to the 
specified number of decimal places). 

In order to illustrate the quartimax method, such a solution will be calculated for 
the simple example of the eight physical variables. The centroid solution, taken from 
the first edition of this text, is repeated in Table 14.2 as the initial solution. To get 
the angle of rotation q>, formula (14.16) is first applied. There are essentially two types 
of terms in this formula—twice the products of corresponding factor loadings 
{la x a - 2 ) and the differences of squares of factor loadings {a) x - aj 2 ) and these are 
listed for each variable in Table 14.2. The numerator of (14.16) is twice the sum of 


Table 14.2 

Quartimax Solution for Eight Physical Variables 


(Initial solution: Centroid) 


Variable 

j 

Initial Solution 

Squares 

Products 

2aji(ij2 

Difference 
of squares 

a) i - ah 

Final S 

olution 

a n 


a 


b n 

bj 2 

1 

830 

-.396 

m 


-.6574 

.5321 

.899 

.196 

2 

.818 

-.469 

.6691 


-.7673 

.4491 

.934 

.131 

3 

.111 

-.470 




.3828 

.902 

.105 

4 

.798 

-.401 

.6368 


■SI 

.4760 

.876 

.172 

5 

.786 

.500 

.6178 


■SI 

.3678 

.315 

.877 

6 

.672 

.458 

.4516 


.6156 

.2418 

.250 

.774 

7 

.594 

.444 


.1971 

.5275 

.1557 

.197 

.715 

8 

.647 

.333 

.4186 



.3077 

.307 

.660 

Sum of Squares 


1.5263 



3.4247 

1.1706 

3.5563 
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the products of these eight corresponding values, namely, 

v = 2(-.6260) = -1.2520. 

The sums of squares of the products and the “differences of squares” from the 
bottom row of Table 14.2 are used directly in getting the denominator of (14.16): 

S = 1.1706 - 3.4247 = -2.2541. 

Substituting these values in (14.16) yields: 


tan 4 (p = 


-1.2520 
— 2.2541 


.5554. 


Since the numerator and denominator are both negative, the angle 4 q> falls in the 
third quadrant according to the third line in Table 14.1. From tables of trigonometric 
functions it is found that 4 q> = —150° 51' and hence the angle of rotation is 
ty — ~ 51 44. The necessary elements for the transformation matrix are sin <p = 
— .6120 and cos q> = .7909, so that (14.20) may be written as 


.7909 ,6120\ 

-.6120 .7909/ 

The expression (14.15) for the final factor matrix reduces to the simple postmulti¬ 
plication of the initial factor matrix of Table 14.2 by the transformation matrix 
(14.21). The resulting quartimax solution appears in the last two columns of Table 
14.2. The quartimax criterion for this solution is 



( 14 - 22 ) Q = £ £ b% = 4.091, 

j= i p= i 

while the corresponding value for the initial solution is only 2.883. In the simple 
case of m 2, only one cycle involving a single rotation leads to convergence. 

Of course, the foregoing example of only two factors is extremely simple and does 
not bring out all the ramifications of the quartimax procedure, but it does show the 
fundamental properties of such a solution. It approximates a simple structure 
solution even though the small values are not as close to zero as one might like them 
to be. No doubt a much better approximation to simple structure could be obtained 
with an oblique solution (see chap. 15), but if an orthogonal solution is desired the 
result in Table 14.2 is the best possible in the sense of this section. It is interesting to 
note that the intuitive-graphical solution of Table 12.3 is very similar to the analytical 
solution, but it does not satisfy the quartimax criterion quite as well (see ex. 1, chap. 14). 

While the indicated type of computation can be done with conventional punched- 
card equipment and desk calculators, the work is very laborious and time-consuming. 
However, it can be done most expeditiously on high-speed electronic computers. 
Such programs were first written for the IBM 701 (and later model machines), the 
Illiac, and several other electronic computers [542]. The machine procedure involves 
successive pairings of factors p < q — 1, 2, • • •, m, and carrying through the full 
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m{m — l)/2 transformations for each cycle. When the machine finds the angle of 
rotation which will give a maximal sum of fourth powers of loadings of rotated factors, 
it makes the transformation only if the angle is greater than some specified value 
(perhaps one degree; or only one minute if the computer has very large high-speed 
memory capacity). Cycles of operation are continued until the sum of fourth powers 
for the entire matrix no longer increases. The convergence is generally rapid in the 
first few cycles and then tends to slow down. 

The application of the quartimax method to a large problem, for which an electronic 
computer was employed, will now be shown. Neuhaus and Wrigley [379] performed 
an analytical rotation on the centroid solution (which appeared in the first edition 
of this text) of the twenty-four psychological tests, employing the Illiac. Their results, 
which took about one minute of computer time, are shown in Table 14.3* The calcu¬ 
lation of cycles was continued until convergence of Q was obtained to the eighth 
decimal place. This required five cycles, but from a practical standpoint the con¬ 
vergence was adequate at the end of two cycles. 

There is rather good agreement between the quartimax solution of Table 14.3 
and the orthogonal multiple-factor solutions obtained by intuitive-graphical methods 
in the first edition of this text, and summarized in Table 14.6. With few exceptions, 
large factor loadings in one case correspond to similarly large values in the other. In 
the quartimax solution, the large values tend to be somewhat larger, and the small 
values smaller than their counterparts in either of the graphical solutions. Exceptions 
are especially noticeable in the third factor where there are fewer well-pronounced 
large loadings in the quartimax solution; and in the first factor (verbal) where even 
the small values are increased in the quartimax solution. This tendency toward a 
general factor is one of the main shortcomings, in the simple-structure sense, of the 
quartimax solution. 


14.4. Varimax Method 


As outlined in the last section, the emphasis in the quartimax method is on sim¬ 
plification of the description of each row, or variable, of the factor matrix. In contra¬ 
distinction, Kaiser [293] places more emphasis on simplifying the columns, or factors, 
of the factor matrix in an attempt to meet the requirements for simple structure. 
Thus, while simplicity of each variable may be attained concurrent with a large 
loading on the same factor, such a general factor is precluded by the simplicity 
constraint on each factor. 

The varimax method proposed by Kaiser [291] is a modification of the quartimax 
method which more nearly approximates simple structure. Following his develop¬ 
ment [293, p. 190], the simplicity of a factor p is defined as the variance of its squared 


loadings, i.e. 
(14.23) 



(p = 1 , 2, • • •, m). 


* In 1958 an independent solution was obtained on an IBM 704 at the System Development 
Corporation, and it was found to agree identically with that in Table 14.3. 
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Table 14.3 

Quartimax Solution for Twenty-Four Psychological Tests 
(Initial solution: Centroid) 


Test 

j 

Verbal 

Mi 

Speed 

m 2 

Deduction 

m 3 

Memory 

m 4 

1 

.369 

.190 

.599 

.068 

2 

.245 

.066 

.384 

.039 

3 

.313 

.010 

.475 

.013 

4 

.359 

.070 

.463 

-.012 

5 

.806 

.135 

-.016 

-.039 

6 

.812 

.030 

.000 

.055 

7 

.854 

.072 

-.044 

-.095 

8 

.660 

.202 

.197 

-.034 

9 

.857 

-.058 

-.020 

.098 

10 

.234 

.700 

-.124 

.112 

11 

.310 

.616 

.011 

.231 

12 

.164 

.688 

.191 

-.012 

13 

.350 

.569 

.323 

-.082 

14 

.324 

.190 

-.026 

.424 

15 

.251 

.114 

.101 

.448 

16 

.289 

.134 

.372 

.368 

17 

.285 

.239 

.022 

.569 

18 

.215 

.324 

.302 

.467 

19 

.276 

.180 

.189 

.323 

20 

.518 

.090 

.354 

.142 

21 

.347 

.385 

.347 

.144 

22 

.526 

.041 

.295 

.255 

23 

.546 

.187 

.441 

■087 

24 

.487 

.432 

.101 

.196 

v. 

5.587 

2.422 

1.958 

1.418 


When the variance is at a maximum, the factor has the greatest interpretability or 
simplicity in the sense that its components (the b’s) tend toward unity and zero. The 
criterion of maximum simplicity of a complete factor matrix is defined as the maxi¬ 
mization of the sum of these simplicities of the individual factors, as follows: 


(14.24) 


m i m 

Z = - s 

P = 1 n 


i j= i 


I I 

P=l\j=l 


The maximization of (14.24) has been called the “raw” varimax criterion by 
Kaiser [293] because of his preference for an improved version which is presented 
below. From empirical studies he found that the “raw” varimax and the quartimax 
analytical methods did not meet the standard of level contributions of factors any 
better than the intuitive-graphical methods. Along with this tendency for the contri¬ 
butions of factors (V p ) to have greater dispersion in the analytical solutions, there 
was the tendency for the more prominent factors to have larger values in both the 
large and small factor loadings than their counterparts in the less prominent factors. 
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In the intuitive-graphical solutions such systematic bias is generally avoided, yielding 
level contributions of factors with more equitable distribution of high and low 
factor loadings. 

This type of bias is attributed by Kaiser [293J to the divergent weights which 
implicitly are attached to the variables by the size of their communalities, namely, 
the square-roots of their communalities. Each variable contributes to the function 
(14.24) as the square of its communality. Hence, a variable with communality twice 
that of another will influence the rotations by four times as much. This means that a 
variable with communality .90 will have a weight four times that of a variable with 
communality .45 in determining the final solution. These relative weights are probably 
quite different from those intuitively assigned in graphical rotations. At Saunders’ 
suggestion, Kaiser modified his original approach by weighting the variables equally 
for purposes of rotation. This is accomplished by extending the vectors representing 
the variables to unit length in the common-factor space, carrying out the rotations, 
and then bringing the vectors back to their original length. 

In place of the “raw” varimax criterion (14.24), the improved standard for rotation 
requires that the final factor loadings be such as to maximize the function:* 


(14.25) 


(bjp/hjY 

p = 1 j = 1 


I I b2 iM 

p = 1 j - 1 


Kaiser refers to this as the “normal” varimax criterion, but since the earlier version 
will not be employed in this text, the simple term “varimax will be understood to 
mean (14.25). It should be noted that the adjusted correlations of variables with 
factors (bjp/hj) correspond to the correlations r" of (4.55), while the original correla¬ 
tions (b ■ ) correspond to the v' of (4.51) since the factors themselves are of unit length. 

The computing procedure for a varimax solution is quite similar to that employed 
for a quartimax solution in the preceding section, but requires that (14.25) be maxi¬ 
mized instead of (14.7). Factors are rotated two at a time according to the scheme 
indicated by (14.15), and the complete cycle of m(m - l)/2 pairings of factors is 
repeated until the value of V to the specified number of decimal places no longer 
increases. 

It will be convenient to introduce some additional notation in order to make the 
subsequent expressions more compact. First, the “normalized factor loadings of a 
variable Zj for a particular pair of factors p and q, will be designated by 


(14.26) 


x j = a jJ h j 
yj = a iJ h i 


and the rotated loadings by X p Y p so that the transformation corresponding to 


* Since a constant multiplier has no effect on the maximization process, the expression (14.24) 
was multiplied by n z for greater simplicity. 
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(14.13) may be put in the form: 


(14.27) 


/ cos (p — sin (p 
(X, Yj) = (xj 30) . 

\sm(p cos (p 


where (p is the angle of rotation in the plane of the factors p and q. Since squares and 
cross-products of the normalized loadings are required in the computation, the 
following notation will be employed: 


v j = 2x jyj 
A = Yu , 

(14.28) 

B-Xvj 
C = I ( U J ~ v j) 

D = 2 1 u j v J 

where all sums are on j from 1 to n. 

The required angle of rotation is shown by Kaiser [294, p. 415] to be given by: 

. D — 2AB/n 

(14.29) tan 4m =- —■= -. 

V ^ C - (A 2 - B 2 )/n 


(14.29) 


tan 4 (p 


Just as before, the solution (14.29) which makes (14.25) a maximum need be con¬ 
sidered only for values of q> between -45° and +45°; and the sufficiency conditions 
for a maximum lead to the choice of the angle of rotation according to the values in 
Table 14.1. 

Perhaps the best way to grasp the computing procedures is to apply them to a 
simple example. For this purpose, the same data of the last section will be employed. 
In Table 14.4 is indicated an appropriate worksheet for all the data that must be 
computed and recorded in the process of carrying out the analytical rotation of one 
pair of factors, when working with a desk calculator. The values of 2 ufj for the 
individual variables are not recorded since only their sum D is required, and it can 
be accumulated in the computer. Its value is D = — .6930. On the other hand, since 
the difference of squares, uj — vj, cannot be accumulated on the ordinary calculator, 
they are recorded for each variable. Only those sums (A, B, and C) required in the 
calculation of q> are recorded in the last row of the table. 

Substituting in formula (14.29) produces: 


tan 4 (p 


- .6930 - 2(3.8490) (,2809)/8 - .9633 

4.0921 - (3.8490 2 - ,2809 2 )/8 _ -5.9341 


* It may be noted that expression (14.16) for the quartimax criterion is simply tan 4<p 
in terms of the notation (14.28). 
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Table 14.4 

Computation Form for Varimax Solution: Eight Physical Variables 


Variable 

j 

Initial Solution 

Square root of 
communality 
hj 

Normalized loadings 

Computing parameters 

a n 

a j2 

x i 

yj 

u j 

v i 

2 2 

Uj - vf 

1 


-.396 

.9196 

.9026 

-.4306 

.6293 

-.7773 

-.2082 

2 

.818 

-.469 

.9429 

.8675 

-.4974 

.5051 

-.8630 

- .4896 

3 

.111 

-.470 

.9081 

.8556 

-.5176 

.4641 

-.8857 

-.5691 

4 

.798 

-.401 

.8931 

.8935 

- .4490 

.5967 

- .8024 

-.2878 

5 

.786 

.500 

.9316 

.8437 

.5367 

.4238 

.9056 

-.6405 

6 

.672 

.458 

.8132 

.8264 

.5632 

.3657 

.9309 

-.7328 

7 

.594 

.444 

.7416 

.8010 

.5987 

.2832 

.9591 

-.8397 

8 


.333 

.1211 

.8891 

.4576 

.5811 

.8137 

-.3244 

Sum 

H 





3.8490 

.2809 

-4.0921 


In calculating the tangent, the numerator and denominator must be determined 
separately so that the algebraic signs may be noted. Then, referring to Table 14.1 
for the case of both numerator and denominator negative, the angle 4 (p must be 
in the third quadrant. From a table of trigonometric functions, 4 cp = —170° 47', so 
that (p = —42° 42'. For this angle of rotation, sin (p = —.6782 and cos q> = .7349, 
and the transformation matrix is 

.7349 .6782 
-.6782 .7349 

Then the normalized rotated loadings are computed, according to (14.27), by 
postmultiplying the original normalized loadings by this matrix. The results (X jt Yj ) 
appear in the first two columns of Table 14.5. Finally, the normalization is removed 
by multiplying each value by the appropriate hj for the row. The resulting varimax 
solution appears in the last two columns of Table 14.5. The varimax criterion (14.25) 
for this solution is 

V = 8[3.5331 + 3.5049] - [4.0142 2 + 3.9860 2 ] = 24.3012, 

for which the necessary sums of squares and fourth-powers of rotated normalized 
loadings are taken from the middle two columns of Table 14.5. 

A comparison of the varimax solution with the previous solutions is most en¬ 
lightening. Of course, the original centroid solution is very poor from a simple 
structure point of view; and this is indicated by the value of only .4078 for the varimax 
criterion. On the other hand, the intuitive-graphical multiple-factor solution of 
Table 12.3 is almost as good, having a varimax criterion value of 24.27. Looked at 
the other way, the varimax solution of Table 14.5 comes closest to an orthogonal 
simple-structure solution which was arrived at intuitively in Table 12.3. The quarti- 
max solution of Table 14.2 is also a close approximation to the desired end, but it 
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Table 14.5 

Varimax Solution for Eight Physical Variables 

(Initial solution: Centroid) 


Variable 

j 

Rotated Normalized 
Loadings 

Squares 

-.' '. ^.• • ■ • b 

Final Solution 



xj 

yj 

bji 

bj2 

1 

.9554 

.2957 

.9128 

.0874 

.879 

.272 

2 

.9749 

.2228 

.9504 

.0496 

.919 

.210 

3 

.9798 

.1999 

.9600 

.0400 

.890 

.182 

4 

.9611 

.2760 

.9237 

.0762 

.858 

.246 

5 

.2560 

.9666 

.0655 

.9343 

.238 

.900 

6 

.2254 

.9744 

.0508 

.9495 

.183 

.792 

7 

.1826 

.9832 

.0333 

.9667 

.135 

.729 

8 

.3431 

.9393 

.1177 

.8823 

.250 

.684 

Sum 

Sum of Squares 



4.0142 

3.5331 

3.9860 

3.5049 

3.316 

2.648 


does not meet the varimax criterion quite as well as the graphical method (see ex 3 
chap. 14). 

As in the preceding section, the work of computing a varimax solution without the 
aid of high-speed electronic computers becomes prohibitive for problems involving 
more than a few factors. However, computer programs for the varimax rotation are 
available for all present-day computers, this being the most popular means of getting 
an orthogonal multiple-factor solution. At many computing centers, the varimax 
program is part of a ‘ factor analysis package” which has the principal-factor method 
for the initial solution. 

A varimax solution for the example of five socio-economic variables was obtained 
by the use of such a factor analysis program package. Starting with the correlation 
matrix from Table 2.2, first the principal-component solution of Table 8J is deter¬ 
mined and then, specifying two factors for rotation, the varimax solution of Table 
14.6 is obtained. Of course, the individual factor coefficients are altered as the result 
of the rotation, as is also the contribution of each of the two factors. However, the 
variance of each variable, and the total contribution of the two factors remain the 
same. Put another way, the proportion of the total variance accounted for by either 
set of factors is unaltered. As noted before, once the space is determined by the 
choice of the number of principal components, a rotation to a new basis in the same 
space has no effect on the lengths of the vectors representing the variables. The 
square of the length of each vector is its variance, according to (4.52), and is given in 
the last column of Table 14.6. 

The varimax solution is indeed a multiple-factor solution satisfying the simple 
structure criteria of 6.2. Of course, for so few variables the conditions cannot be 
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Table 14.6 

Varimax Solution for Five Socio-Economic Variables 3 
(Initial solution: first two principal components, Table 8.1) 


Variable 

Mi 

m 2 

Variance 

1 

.01602 

.99377 

.98783 

2 

.94079 

-.00883 

.88515 

3 

.13702 

.98006 

.97930 

4 

.82479 

.44714 

.88022 

5 

.96821 

-.00604 

.93747 

Contribution of factor 

2.52182 

2.14815 

4.66997 

Percent of total variance 

50.4 

43.0 

93.4 


a The only reason for showing five decimal places (of the computer output) is to provide a means for 
checking numerical calculations. 

taken too literally. The factor weights of 0 or 1 in the first decimal place certainly 
may be viewed as essentially zero. In this sense the factor matrix exhibits one zero 
in each row, with the exception of variable 4; two zeros in each column; and several 
variables whose entries vanish in one column but not in the other. Even the exception 
of variable 4 is not contradictory to the set of simple-structure criteria (see number 5). 
Again it should be emphasized that the simple-structure criteria have compelling 
intuitive value but lack the precision necessary for mathematical computation. The 
varimax solution, on the other hand, is a precisely defined method which indeed 
approximates orthogonal simple structure. 

An example of a varimax solution for a large problem is presented for the twenty- 
four psychological tests so that the varimax solution may be compared with previous 
ones for these data. In Table 14.7 the varimax solution is given along with the quarti- 
max (from Table 14.3) and a subjective graphical solution (from the first edition of 
this text). The coefficients are recorded to two decimal places to facilitate comparison. 
A quick glance will immediately lead one to the intuitive conclusion that the varimax 
solution somehow meets the vague notions of simple structure better than either of 
the other two solutions. Applying each of the five criteria for simple structure (see 
6.2), it is found that both the varimax and the subjective solutions satisfy them insofar 
as a non-mathematical standard can be satisfied by crude judgment. On the other 
hand, it is quite obvious that the quartimax solution fails to satisfy the standard of 
having at least four zeros in each column—the first factor tending towards a general 
factor as noted in the previous section. 

From the nature of the tests with the large weights, the following names are 
suggested for the factors: M y erbal 

M 2 : Speed 
M 3 : Deduction 
M 4 : Memory 
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Table 14.7 Subiective Orthogonal 

Comparison of Varimax, Q“^_p our Psychological Tests 
Multiple-F ac t or Solutions. Twenty 

(initial solution : Centroid) 



. f the third factor may be in ordcr-^Ttie ^ed 

:£n^ 

^ : 

e m cffic content of the test^ tQ ^ ^ „ best „ 

The varimax s°' utl ° best with the intuitive ° f “j index ot relation- 

te sense that it c01 j re Thi can be demonstrated by a goa \ some ideal 

y the graphical solution. This ca q{ these so i u tions has as its& would 



14.4 


direct solutions 




(14.30) 


(V-S), 


f rms 


/ 22 ZE 5 Z 

mn 


where the sum is over all ™ 

b Jp in the S-solution Th^ZT^ 0 ”^ fac,or loadings* i th 

“ f0,, °^ •• mean values computed from Tabten" am 

(14.31) 


(V ~ S) rms = .062, 

6)r* s = -116, 


Although the scaJe is arbitr ^ ^ 127 

ind r d a **» o f 

and the subject™ “ d » almost as g ea ' beT’' “ d diff —ce 

In addition to the fact tha, , fc . the ^timax 

tWs IToTsohlZl °tlat mple f mathematically 

introduced m signlh of 

structure, it may be subject to the° ^ t0 resu,ts that appear SyStem 

because different sets of i' t,? th Same criti cism as the int.wr 6 lke sim P le 

f » the varimax ^ tbeseI ^ ^ ^ metho P ds 

mathematical defi„i,io n iit fj muS < be “ore fundamemai if , The ratio "aIe 

Kaiser [ 293 , p. 1951 wh ° r the Nation problem. Such a hi '• y ‘° provide a 
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varimax factors obtained in a sample will have a greater likelihood of portraying 
the universe varimax factors. While the general impression may be that simple 
structure is the “ultimate” objective of the rotational problem, Kaiser [293, p. 195J 
suggests that “the ultimate criterion is factorial invariance.” Because the varimax 
solution was devised with a view to satisfying the simple structure criteria and 
subsequently found to show this kind of invariance, Kaiser also suggests that 
“Thurstone’s intuition was basically on the trail of factorial invariance.”' Thus, the 
“invariance criterion” is proposed as an alternative or possible improvement to 
“simple structure.” 

An example of the tendency for the varimax criterion to lead to factorially invariant 
solutions is provided by Kaiser [293, pp. 196-8]. Starting with the initial centroid 
solution for the twenty-four psychological tests, he rotated the first five tests, then 
six, and so on, adding one test at a time until all twenty-four were rotated. It is in 
this sense that he considers the effect on the factor loadings of changes in the com¬ 
position of the test battery. For the particular example, the first and third factors 
are essentially invariant from the outset while the second and fourth factors fluctuate 
some before becoming stable. 



15 

Analytical Methods for the Multiple-Factor 
Solution: Oblique Case 


15.1. Introduction 

In the last chapter the rationale for analytical solutions was presented, along with 
some of the historical developments. Two practical methods—albeit requiring high¬ 
speed computers—were given for the determination of multiple-factor solutions 
approximating simple structure under the condition that the factors be uncorrelated. 
The restriction of orthogonality is removed in the present chapter. 

While the mathematics and the computations become more involved, there is 
much greater flexibility in an oblique solution. Of course, an orthogonal (or near- 
orthogonal) solution may result out of the more general oblique conditions if, in 
fact, the “best” solution should tend toward orthogonality. The conditions set forth 
for an oblique solution do not preclude zero correlations among the factors. 

The theory and procedures which lead to oblique simple-structure solutions stem 
from the same objective criteria underlying the orthogonal solutions of chapter 14. 
Upon relaxing the restriction of orthogonality several of the criteria of chapter 14 can 
be adapted to produce oblique solutions. In 15.2 one form of the quartimax criterion 
is generalized to produce the oblimax method, involving the maximization of fourth 
powers of oblique structure elements. The original analytical procedure due to 
Carroll—the quartimin method—is developed in 15.3. Further improvements and 
generalizations of that method are presented in 15.4 under the oblimin methods. In 
all these oblique methods a rather involved procedure is followed, wherein the 
desired simple structure principles for the primary factor solution are introduced in 
an indirect manner. Finally, in 15.5, a direct procedure is presented for getting oblimin- 
like solutions. In each of these sections the theory leading to the particular oblique 
solution is presented and illustrated with numerical examples. 

The basic concepts that arise in the analytical methods of rotation to multiple- 
factor solutions, both orthogonal and oblique, are listed in Table 15.1 along with 
the symbols employed for them in this text. This summary of concepts and notation 
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Table 15.1 

Concepts and Notation 


Concept 


Orthogonal Case 


Original factors 
Original factor loadings 
Transformation matrix 
Final solution 


F p (p = 1,2, •••,»!) 

A = ( a j P ) (j = 1,2, •••,«) 
T = 0« P ) (P,q = 1,2, — , m) 
B = AT = (b jp ) 


Final factors 


M 


P 


Criterion for solution 
Quartimax 


Varimax 
Oblimax 
Quartimin 
Covarimin 
(Oblique Varimax) 
Oblimin 
Binormamin 
Direct oblimin 


Equivalent expressions: 


(14.7) 

Q 

= max. 

(14.10) 

M 

= max. 

(14.11) 

N 

= min. 

(14.12) 

K 

= max. 

(14.25) 

V 

= max. 


Notation 


Oblique Case 


F p (orthogonal) 

A = (a Jp ) 

A = (X qp ) 

V = AA = (v jp ), reference structure 
matrix 

P = (b Jp ), primary pattern matrix 
T p if primary factors 
A p if reference axes 


(15.2) 

K 

= max. 

(15.15) 

N 

= min. 

(15.32) 

C 

= min. 

(15.39) 

B 

= min. 

(15.40) 

D 

= min. 

(15.45) 

F 

= mip. 


should assist in distinguishing the methods in this chapter from those of the preceding 
chapter, and can serve as a ready reference for the basic notation employed in this 
chapter. Specific notation required for the development of the ensuing methods will 
be introduced as needed. 

15.2. Oblimax Method 

It will be recalled that four separate criteria in chapter 14 were found to be equiv¬ 
alent when the rotated solution is restricted to orthogonal factors. When oblique 
factors are permitted, the four criteria are no longer equivalent; and, as a matter of 
fact, each method is not immediately generalizable to the oblique case. The criterion 
(14.12) can be applied to the oblique case. While the criterion K = maximum is 
equivalent to Q = maximum in the orthogonal case (since the denominator of K is 
an invariant under orthogonal transformation), the full expression for l<t must be 
used in the oblique case. 

The problem is to find a transformation matrix A which will carry an initial factor 
matrix A into a final solution V, as in (13.30): 

t 15 - 1 ) V = AA, 
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in which the elements v jp of the resulting factor structure matrix V will satisfy the 
criterion K = maximum, where 


n m 


m \2 


(15.2) 

j= l p= l / \ j= 1p= i / 

This restatement of (14.3) and (14.12) in different notation should make perfectly 
clear the distinction between the orthogonal and the oblique case. Also, it should be 
noted that in the latter case the term “loading” is used for the structure value, i.e., 
the correlation of the variable with the reference axis: 


(15-3) v jp = r ZjAp . 

An oblique solution obtained under the condition that (15.2) be a maximum will be 
called an “oblimax” solution.* 

The development of the theory of the oblimax method will be made following the 
work of Pinzka and Saunders [392]. The orthogonal projections v jp on an oblique 
reference axis A p are determined in such a manner as to make 

(15.4) K p = t ”jp I (£ 4.) (p = 1, 2, • ■ ■, m) 

a relative maximum under rotation in a plane. As the structure values v jp are altered 
in successive iterations, the function (15.2) eventually attains a maximum value. 
Thus, while (15.4) represents the impact of maximizing the u’s on a single reference 
axis, the criterion (15.2) represents the ultimate effect such successive maximizations 
have on the final solution V. The oblimax criterion can be expressed in terms of the 
factor loadings of the initial solution and the direction cosines in the matrix of 
transformation, and thereby the latter values can be determined. 

To simplify the development, the coordinates of any point in the plane of the first 
two factors of the original rectangular reference system are designated (a, b) instead 
of (a n ,a j2 ). Then the elements v jp resulting from the transformation (15.1) can be 
expressed as follows: 

(15.5) v jp = aX lp + bX 2p (p = 1, 2, • • •, m), 

according to (13.20). Since the direction cosines must satisfy the conditions 

m 

(15.6) Z XI P = 1, (p = l,2,"-,4 

k = 1 

it suffices to consider the ratio of the two in (15.5) as a single unknown, x = X 2p /X lp , 
and rewrite the expression in the form :f 

(15.7) v jp = a + bx. 


* This term was suggested by Saunders. 

f It is tacitly assumed that X lp ^ 0, otherwise the ratio would not be defined. Of course, if 
X lp = 0 then X 2p = 1 and the rotated axis is coincident with the second axis of the original 
reference frame. 
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Substituting (15.7) into (15.4), and without designating the factor p, the criterion to 
be maximized becomes: 

(15.8) K = -£•—- + bx)A 

[Yia + bxf] 2 ’ 

where all summations are understood to extend over the n values of a and b (viz., 
a n and a j2 for j = 1, 2, • • •, n ). To simplify the work further, the functions of x in the 
numerator and denominator are designated by: 

N = Y,(a + bx)\ 

Z> = £(a + fcx) 2 . 

A necessary condition for the maximization of K = N/D 2 is the vanishing of its 
derivative with respect to x, namely, 

(15.10) K' = g ’*' -^ 2DD ' ] - 0. 
or, more simply, 

(15.11) f(x) = DN' — 2ND' = 0, 

where /(x) is used to designate the resulting function of x. Upon substituting the 
values (15.9) and their derivatives into (15.11), the resulting equation would in general 
be of the fifth order, but fortuitously the coefficient of x 5 is zero. Representing the 
quartic equation by 

(15.12) a 4 x 4 + a 3 x 3 + a 2 x 2 + a x x + a 0 = 0, 
its coefficients are given by 

«4= lafcEfc 4 - Ifc 2 2> 3 . 

«3= Ea 2 E b* +2'j?ab'£ab 3 

(15.13) « 2 = 3Ea 2 Z<!fe 3 -3Y,b 2 Y,a% 

“i =3Ea 2 ZaV-22>£ a 3 ;,- E* 2 Z« 4 . 

«o = E q1 I - E ah E “ 4 - 

While these coefficients appear rather imposing at first glance, a certain symmetry of 
form will be detected upon more careful scrutiny. 

Any one of the four roots of (15.12), if real, will produce either a maximum or a 
minimum of K. When all four roots x = X 2p /X lp are real and distinct, they correspond 
to two relative maxima and two relative minima of K. From calculus it is known that 
those roots of (15.12) for which the second derivative K" is negative make K a 
maximum. Differentiating K' = f/D 3 produces 


(15.14) 


D 3 /' - f(3D 2 D') f 
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where the last equality follows from / = 0 as stated in (15.11). Since D is a sum of 
squares and is always positive, the algebraic sign of K" is the same as that of/'. 

Then the sufficient condition for a maximum involves the substitution of a root of 
(15.12) into the derivative of the same function to note if the result is negative. Actually 
the derivative of (15.12) need not be determined explicitly because its behavior in the 
vicinity of the root can be determined from the polynomial itself. If the polynomial is 
increasing in the neighborhood of the root then K must be at a minimum; if decreas¬ 
ing, then K must be at a maximum. The sign of a 4 determines the sign of the entire 
polynomial as x increases without limit. Therefore, for the largest root K will be a 
maximum if a 4 is negative, and a minimum if a 4 is positive. Once it is known whether 
the largest root produces a maximum or a minimum the same information is known 
for each of the other roots since the roots alternate between maxima and minima. 

The two roots of (15.12) that correspond to maxima yield the direction cosines of 
the new reference axes, and the analytical rotation is carried out accordingly. This 
process is repeated for all pairs of factors p = 1,2, • • •, w — 1 and q = p + l,p + 2 , 

• • •, m, until K stabilizes at its maximum value (to a specified number of decimal 
places). 

The oblimax method, just as all the other analytical procedures, requires a vast 
amount of computing and hence becomes impractical without high-speed computing 
facilities. Pinzka and Saunders [392, pp. 11-30] present a flow chart of the computa¬ 
tions and outline a computer program for a hypothetical electronic computer. An 
oblimax program was written by Kern W. Dickman for the Illiac. Such a solution 
was obtained on this computer for the twenty-four psychological variables, and is 
exhibited in Table 15.2. The oblimax reference structure matrix V and its associated 
primary-factor pattern matrix P are recorded in this table from the direct output of 
the computer (except for the reflection of two factors and rounding to three decimal 
places for convenience of printing). The criterion (15.2) was K = .02364 for the 
initial centroid loadings and increased to a maximum K = .04168 for the structure 
values of the oblimax matrix V. The correlations among the reference factors and 
among the primary factors also are outputs of the computer. 

The principal purpose of presenting Table 15.2 is to show the feasibility of an 
oblimax solution on a large electronic computer. It is also of interest to note that 
this objective solution is very similar to one that had been obtained by intuitive- 
graphical methods. 

15.3. Quartimin Method 

The next analytical procedure for an oblique solution is the one originally intro¬ 
duced by Carroll [69]. The criterion is N — minimum, just as in (14.11) but without 
the constraint that the factors be orthogonal in the rotated solution. Carroll now 
suggests the name “quartimin” for this method since it involves the minimization of 
terms of the fourth degree, viz., the sum of cross-products of squared factor “load¬ 
ings”.* 

* Of course, for an oblique solution the use of the term “loading” must be made explicit (see 
13.5), and here it stands for the structure value as given by (15.3). 
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Table 15.2 

Oblimax Solution for Twenty-Four 
(Initial solution: Centroid) 


Psychological Tests 



minimum, where „ . 

N= I 

t15 15) ed to terms of the initial 

For any row; o 



,5 ' 3 DIRECT solutions 

( l> e resulting element of v „ 

(15.16) ° fV ’ a ^‘o (J 5, tls 


JP / ( 

so that (15.15) can h * =I 

be ex pressed intheform 

(1 $ 1 ' 7 \ 


£ a »K,, 


(1S.17) 


nt n , 

»„„ „ ^li v -f- 

«“*'‘■‘SSSS’SS E' 1 " 4 fc **» o, 

'**" ““ *■ ^-^.rss^cw-js 

(15.18) n Jt changes 

(l 5 . J9) ,Cft ,s '"dependent 


W 

<*> * *1 

N *~ t w-v 2 . 

W,JI simplify (he fnl!„ • “ 

StrUCtUre column 

(15.21) x is, according 

ir 


^ “ 2 <&, 


noni a chunnr^ -a '"Viun 

0 5.21 j dnge m is, accordir 

Hence, corresponding to , h „ V ' = AA *' 

(I5. 22) m ° fSqUared «™«ure values is 

V' V - A , a , 

* These condition * * ~ ^A'AA, 
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and the entire expression (15.20) becomes: 

( 15 - 23 ) N x = A'.A'WAA* 

where W is a diagonal matrix of the n scalars Wj . Define a new matrix 

( 15 - 24 ) ^ C = A'WA, 

so that (15.23) may be written 


( 15 - 25 ) N x = A' X CA X . 

The problem, then, is to minimize (15.25) under the condition 
(^• 2 6) A' x A x = 1 

corresponding to (15.6) for the reference vector x This can be accomplished by the 
method of Lagrange s multipliers, yielding the characteristic equation of C, namely: 


(15.27) 


(c 11 N x ) c 1 2 
C 21 (C 2 2 ~ N x ) 


' 1 m 


"2m 


"m2 


(Cmn ~ N x ) 


Any latent root of this m th order linear equation will make the determinant vanish- 
and m order to minimize N x , the smallest latent root is selected. Then the elements of 
the latent vector associated with this root are the desired solutions X When the 
smallest N x is determined from equation (15.27), it is substituted into Ihe following 
set of homogeneous linear equations (from which the condition (15.27) was derived): 


( C lt ~ NxMtx + C l2^2x + 

(15.28) C 21^1* + (c 22 - N x )X 2x + 


CImXmx 
< "2mX-mx = 0 ? 


"ml /i lx 


+ 


^ 2x + 


"m2 /l 2x 


T (C mm N x )X n 


0. 


The resulting solutions X px are all proportional to one arbitrary solution and by 

applying the condition (15.26) the direction cosines for the new axis are obtained 
which will make the criterion N x a minimum. 

The theory of the quartimin method, as outlined above, clearly implies a tremendous 
amount of computing. The work involved in preparation for the iterations, determin- 
g C and the large number of iterations that are usually required, indicate that only 

fea!ib e %h aPab i 1 Uy ° f a 1 hlgh " Speed electroni c computer is a quartimin rotation 
feasible. The only practical exception is the case of only two common factors. 

More will be said about computer programs for the quartimin method in 15 . 4 . To 
1 ustrate the preceding theory, however, a simple numerical example will be em¬ 
ployed ; and to make comparisons possible, the example of the eight physical variables 
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again is used. The centroid solution for this example is taken as the initial solution A 
and appears in Table 15.3. 


Table 15.3 

Quartimin Solution for Eight Physical Variables 


(Initial solution: Centroid) 


Variable 

j 


1 

2 

3 

4 

5 

6 

7 

8 


Initial Solution: A 

Elements of W for First 
Two Iterations 

Structi 

ire: V 

aj i 

a j2 

x = 1 
w } = a% 

x = 2 

Wj = 4 

v n 

V J2 

.830 

.818 

.111 

.798 

.786 

.672 

.594 

Ml 

-.396 

-.469 

-.470 

-.401 

.500 

.458 

.444 

.333 

.1568 

.2200 

.2209 

.1608 

.2500 

.2098 

.1971 

.1109 

.6889 

.6691 

.6037 

.6368 

.6178 

.4516 

.3528 

.4186 

.783 

.838 

.817 

.770 

.007 

-.020 

-.050 

.072 

.046 

-.024 

-.044 

.027 

.814 

.723 

.673 

.601 


The object of the iteration process is successively to alter the values of the trans¬ 
formation matrix 


(15.29) 



^-12 

^- 22 - 


in such a manner as to reduce the value of N, as given by (15.17) to a minimum. 
The process is begun by arbitrarily selecting the first column to be the vector A x 
for which the values A 21 are going to be changed. Having designated x - 1, e 
values (15.19) are determined next. These w ; - values are simply the respective a j2 lor 
the first iteration, and are also given in Table 15.3. The diagonal matrix W is made 
up of these elements. Then the matrix multiplications m (15.24) are carried out to 

produce: 


“ .8561 -.0294“ 

.0294 .3053. 


The minimization process leads to the characteristic equation 
(.8561 -AT,) --0294 = ^ 

-.0294 (.3053 - N x ) 


Nl - 1.1614 N x + .2605 = 0, 
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from which the smallest root is found to be N x = .3038. For this value of N x , equa¬ 
tions (15.28) become 

.55234 x - .02944 x = 0, 

-.0294AU + .00154! = 0. 

From the first of these equations, 


and, upon applying the condition (15.6), viz., 

.305035 _ 

.000864 Al 1 + lli ~ 1 

there results 4i = -002824, and hence X 2l = -997176. The direction cosines of the 
new position of the first reference axis are X tl = .0531 and X 21 = .9986, while the 
second axis remains unchanged. The impact of this first transformation is to reduce 
the value of N from .8561 for the initial solution to .3038. 

The second iteration is made by taking the second column of (15.29) to be the 
vector A* and using the new values of the first column for the remainder of the 
transformation matrix. In this case, a new matrix C is set up, for which the characteris¬ 
tic equation is 

(.8245 - N x ) .0605 

= 0 . 

-0605 (.3037 - N x ) 

The least root of this equation is N x = .2968, a slight drop from the previous value 
of .3038. Proceeding as before in applying the condition (15.6), the new direction 
cosines for the second reference axis are found to be X 12 = -.1139 and X 22 = .9935. 

After seven iterations the value of N is found to be stable at N = .0065. These 
iterations changed the values of the elements in the transformation matrix until the 
following were reached: 


1 .4756 -.5432 \ 

'.8797 .8396/ 

The signs of the second column are reflected and the two columns are interchanged 
in order to present the final results in the same order as previously. The transforma¬ 
tion matrix may then be written: 

(.530) A = I ' 5432 - 4756 ) 

’ — .8396 .8797/ 

Postmultiplying the initial solution A by this matrix produces the factor structure of 
the quartimin solution in Table 15.3. 
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The quartimin structure V is very similar to both the intuitive-graphical solution 
in Table 13.3 and to the oblimax solution (which was given in the first edition of this 
text), the difference between corresponding elements being no greater than .003 in 
either case. In spite of the close agreements the differences among these solutions 
stem from fundamental differences in the basis underlying each one. It is of interest 
to note that the oblimax criterion for the quartimin solution of Table 15.3 is 
K = .12890. Of course this cannot be quite as large as the maximum value, K = .12891, 
for the oblimax solution (although the difference is only in the fifth decimal place). 
Theoretically the present solution is not as good as the one obtained according to 
the oblimax criterion. On the other hand, the quartimin criterion for Table 15.3 is 
N = .0065 while a similar computation for the oblimax solution yields N = .0068. 
In the same sense, the present solution is better than the oblimax solution according 
to the quartimin criterion. While the foregoing differences were rather academic, 
real pronounced differences can be expected in situations where the clusters are not 
so well-defined as in the simple example of eight physical variables. 

15.4. Oblimin Methods 

In the preceding section an approximation to a simple-structure solution was 
obtained by minimizing the expression (15.15) involving terms of the fourth degree. 
Carroll [71] has generalized his original criterion to a whole class of methods for 
oblique transformations to simple structure which involve minimization of certain 
expressions. This class of methods, involving oblique factors and a minimizing 
criterion, is designated by the term “oblimin.” Carroll arrived at the general class 
of oblimin solutions from a consideration of his quartimin criterion and Kaiser’s 
oblique version of the varimax criterion [291]. 

The criterion for the latter solution, which Kaiser [293] considers the most obvious 
way of relaxing the restriction of orthogonality in his original varimax method, is 
the minimization of the function :* 

m 

(15.31) C* = £ 

p<q=l 

i.e., minimization of the covariances of squared elements of the factor structure V. 
In the orthogonal case the criterion (14.24) is equivalent to (15.31). Just as (14.24) is 
replaced by (14.25), so Kaiser [293, p. 198] suggests replacing (15.31) by 

m r n n n 

(15.32) C = £ n £ (v] p /hf)(v] q /h]) - £ v)Jh) £ v]Jh) 

p<q=l [_ j= 1 j~ 1 j~ 1 

involving “normalized loadings.” This implies the normalization by rows of the 
initial factor matrix, carrying through the transformation in terms of the extended 
vectors, and then reducing the matrix V thus obtained to the original-length vectors 
by multiplying the elements in rows of V by the square root of the respective 
communalities. 

* Also called the “covarimin” criterion by Carroll [70]. 
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j=i j= i j =i / 


MULTIPLE-FACTOR SOLUTION: OBLIQUE CASE 15.4 

t , B ° th ^ aiSer and Carro11 have found from empirical investigations that neither 
the quartimm nor the covanmm methods work very well The trouble is that tho 
la ter procedure is almost invariably biased toward fa t0 r axes wUch are too 
orthogonal, while the former procedure is just as biased in ^oppositedfrect 
of factor axes which are too highly correlated. Since covarimin tends to te -'too 

“biquartimin’^criterion asTcompromise 1 - 6 '' “ ^ Pr ° P ° Sed the f ° ,lo ™S 

" 5 ' 33j B* = N + C*/n = minimum, 

where N is defined in (15.15) and C* in (15.31). In the last term r n 
expression including division by n is retained, whereas it had bee^ d^oJpelTas 
immaterial to the minimization of (15.31). The rationale for (15 33) is the simuT 
taneous minimization of two quantities, each of which has a valid justification 
Subsequently, Carroll [71] generalized this simple sum of the two separate criteria 
o permit varying weights of the “quartimin” and “covarimin” components tttaS 

accomplished by introducing variable parameters a and/? as follows P 

B* = aN + fiC*/n = minimum. 

Then, substituting the criteria N and C* from 115 1 5 i and nun n, • • 

becomes (D ' D ) and (15.31), the new criterion 


(15.35) B* 


Z 

p<q~l 


a Z uiu? 


J=1 


jp v jq 


+ fi 


n Z rt-vt 

L J=i 


,? v ? 

jp u jq 




j= l 


V: 


.“srr ™ “ 1 - - -»--— 


(15.36) 




p< q— l 


(a + v jp v jq )-£ _Z 4>.Z v l 


Dividing this expression by the sum of the weights and setting 

(15 ' 37) y = 

the general oblimin criterion is given by : 


(15.38) 


B *= f- l n i vfr], - y i £ vl 

p<q~ i * j= l J 


"— j «• - ■*«. 


mm. 


m T ” n 

(15.39) B = £ _ [» £ K/hj)(vj q /hf) - y £ vjp/h] £ vfq/h 2 

J ~ 1 j = 1 

It will be noted that the covarimin criterion nuiUen • , . 

corresponding to the value y = 1 of the arbitrary parameter.^Similarhy^th'eq^artimhi 

» Or in terms of normalized elements, (15.32) is a special instance of (15.39). 
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instances of the general oblimin criterion are summanzed as follow . 


Quartimin: 7 = 0; most oblique. 


Biquartimin :y = .5; 
Covarimin: y = 1; 


less oblique, 
least oblique. 


Of course, any other value —g 

corresponding variation m the c " te ™" r p ° 3] sugges ts “that results 

therefrom. Based upon empirical evidence, Carroll |71 p. JJ m 

will be most generally satisfactory when y “ * 8 mini mizing criterion 

Another analytical procedure Dickman [97, 

■ % - - 

many similarities in its formal expression, name y. 

m 

(15.40) D = Z 


pCq-l 


i iv 2 jP ih]){v]jh)) 1 1 ) (.i v « /h J 


= mm. 


■ • it^ri from Kaiser’s attempt to resolve the undetermined parameter 

This criterion resulted from Kaiser s arte p 15 4Q) ovides a solution 

y in the oblimin class of solutions. and the “too 

which corrects for the too ob lq . , , ar K;t rar ii v taking y = i of. the 

orthogonal” bias of the covarimin crdenon with the b"quartimfn, Kaiser and 

shnp^ orextremelj^comple^ criterion (Ts/iO) i s better ; but if the data are moderately 
ll,otte e rs W Kafser and Dickmansummarize 

t ^ased s 

the underlying three dimensions of the tests and the 

between their analytical solution or k- nicer and Dickman conclude that 

subjective-graphical solution is almost ancann y- ^ accounts for these results. 

the degree of simplicity ot structure in eren detailed mathematical 

An example of a binormamin solution is given later, but a detailed m 

development involving the criterion (15.40) is not included. 

* In order to avoid confusion with the oblimax criterion, the symbol “D" is employed here 
instead of the symbol “K” used by Kaiser and Dickman. 
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• „ hv the criterion (15.39), it is 

, . c of obiimin solutions given by m , difficult. Since 

Returning to the clas . such a factor solution is y loye d with desk 

,bvious that the task of comput g ( oblimin methods y efficient com- 

here is little likelihood tha 8 cu i at i ons are not presen aN for t he 

S—. "“S’ 

puter program has bee P P d t0 the IBM 7094 y rf theS e calculations 

IBM 704 originally, but s 7 win be given of the for found 

computers-and a general " u(iog procedure desc^ed in 1«, modifica . 

Somewhat “^ure (even for large matricesito b the™ y untjl it 

columns unchanged) until am, .. • .. 2 J 


B, 




^ ^ (I**) ^ J • _. rvc ie” of the iterations. 

a single column is designated a mi called a “major cycle.” 

This operation on a g ^ ^ q{ the m columns of ’ involves the solution 
A set of m minor cy , the electronic compu eigenvalue and 

Fo r each minor cy le^t^cor ^ ^ algebr aical Y S ~ 6] i rom such an 

of a characteristic equation y metric matrix [71, PP- 4 DJ . ired Vx 0 f 

corresponding eigenvector of ^ transtorma tion m value of (1541). 

eigenvector is derived a colu is the required m ajor cycle 

<£ structure"3, taking succes^e^ ^ cycles until a 

The native proem ^ program involves suc« the am ount of 

is accomplished. in ence is attained. This is m of the m values 

satisfactory degree o of the criterion (15.39), conn ing ^ amoun t of change 

change m the total cycl e; or more precise y, y 

for the first example. T e m aram eter y were emp oy recalled that 

A, and three different * t^ P ^ 7Q94 computed Hwifi^^ 

structure matrices o ^ same example was also . . so f u tion), and the 

the quartimin solution Jr employing a centroid differences in 

methods in the last sectio f 153 . of course there are , d 00 different 

structure matrix appears t he tw0 solutions ^ h ile the 

structure vain* SfZ ZZ S in Table 

initial matrices, but also o 9) with 7 = 0. Of the tn than either 

second used the "°™ clearly satisfies the simple struc simplest because the 

15.4, the covarimm less^dea ft may fee considered he s^ zer0 

: »t the . ° the ; act or S are uncorrelated in this exjrmpte W desirab ,e (han the 

: ST to Huartimm structure may make it app 
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Three Oblimin Solutions for Eigh( 

(total solution ; Minres Tab|e 9 3) b,es 



htquartimin solution, the latter h . 

C* — - het W ee„ the 

Another, more practical an . “* n button appears to be the 

tron?!^ made to the 

after the r“ s tl°/ ^ althoug^he weT V “ in *™s 

The output of the computo ah ^ ‘f 7 Were shrunken agljn "* n0rmal,z ed and 

= he A T C o:r ion r M * * 

formedhmf rtf 0 *^ ^land(4)^matrix ^numt^r^^^tf^^^^^ahons 
^ primary factr pmf ~ S ^ ^ F «e firsuw^'f' 7 " 
primary factor pattern P m u be obtained to complete the ^uv ° f these matn 'ces 
relationship (13 43) What if be de ‘ermined from the structure vT S ° iution ' The 
between *"• “ <* ehagona matrix^ploying the 

transformation matrix T from fhl f,™ 8 is defi "ed m (13 3n Jl ■ 
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Table 15.5 

Biquartimin Solution for Twenty-Four Psychological Tests 

(Initial solution: Centroid) 


Test 


Reference Structure: V 

Primary Pattern: P 





Verbal 

Speed 

Deduction 

Memory 


Aj 

^2 

a 3 

A* 

T 

t 2 

t 3 

T 

1 4 

1 

.014 

.094 

.598 

.051 

.015 

.103 

.666 

.058 

2 

.028 

.007 

.392 

.029 

.031 

.008 

.436 

.033 

3 

.067 

-.052 

.498 

.000 

.074 

-.057 

.555 

.000 

4 

.106 

.011 

.490 

-.022 

.117 

.012 

.545 

-.025 

5 

.675 

.110 

.075 

-.008 

.748 

.120 

.083 

-.009 

6 

.670 

-.016 

.089 

.084 

.742 

-.018 

.099 

.095 

7 

.755 

.063 

.068 

-.061 

.836 

.069 

.076 

-.069 

8 

.449 

.159 

.256 

-.019 

.497 

.174 

.285 

-.021 

9 

.722 

-.111 

.079 

.130 

.800 

-.122 

.087 

.146 

10 

.080 

.650 

-.186 

.132 

.089 

.711 

-.208 

.149 

11 

.074 

.525 

-.054 

.245 

.082 

.574 

-.060 

.276 

12 

-.072 

.641 

.132 

-.011 

-.080 

.701 

.146 

-.012 

13 

.074 

.524 

.307 

-.082 

.082 

.573 

.341 

-.092 

14 

.135 

.070 

-.069 

.437 

.149 

.077 

-.076 

.492 

15 

.029 

-.019 

.052 

.452 

.032 

-.021 

.058 

.508 

16 

-.032 

-.007 

.330 

.359 

-.035 

-.008 

.368 

.404 

17 

.034 

.079 

-.053 

.579 

.038 

.086 

-.059 

.652 

18 

-.130 

.161 

.217 

.460 

-.144 

.176 

.242 

.518 

19 

.035 

.066 

.151 

.324 

.039 

.072 

.168 

.365 

20 

.240 

-.002 

.378 

.143 

.265 

-.002 

.421 

.161 

21 

.038 

.289 

.318 

.141 

.042 

.316 

.353 

.159 

22 

.250 

-.071 

.311 

.258 

.277 

-.078 

.346 

.291 

23 

.223 

.095 

.464 

.085 

.247 

.104 

.516 

.095 

24 

.234 

.341 

.082 

.211 

.260 

.373 

.091 

.237 

Factor 


Correlations Among Factors 



1 

1.000 

-.120 

-.233 

-.216 

1.000 

.262 

.341 

.337 

2 


1.000 

-.172 

-.233 


1.000 

.295 

.338 

3 



1.000 

-.192 



1.000 

.329 

4 




1.000 




1.000 


But, from formula (13.31), it follows that 

(15.42) T' = DA _1 , 

which states that T' is obtainable from A -1 by normalizing its rows. The inverse of 
the non-symmetric transformation matrix A may involve considerable computa¬ 
tions. To simplify the process, the symmetric matrix W of correlations among the 
reference factors A p is set up, namely: 

(15.43) = A'A, 
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and its inverse can be obtained by the square-root method of 3.5. Then the desired 
inverse of A follows simply from the matrix product: 


(15.44) A -1 = MP^A'. 

Finally, the algebraic factors required to normalize the rows of A -1 are the values 
for the principal diagonal of D. 

For the example of twenty-four psychological tests, the matrices A -1 and D are 
obtained in ex. 9, chap. 15. The proportionality factors necessary for calculating the 
primary pattern are contained in the matrix: 


D 1 


~ 1.1075 0 0 0 

0 1.0939 0 0 

0 0 1.1128 0 

0 0 0 1.1251 _ 


Applying formula (13.43), the primary pattern of Table 15.5 is obtained. 

The objective solution of Table 15.5 may be compared with the solution obtained 
by subjective graphical methods, and also with the oblimax solution of Table 15.2. 
There is a striking similarity between the biquartimin and each of the other two 
solutions. The biquartimin factors have smaller intercorrelations than either those 
arrived at by graphical methods or those determined by the oblimax method. 
The relative contributions of the factors to the total communality of 11.383 appear 
to be somewhat different. The complete analysis of the contributions of the bi¬ 
quartimin factors is presented in Table 15.6. These analytically determined factors 


Table 15.6 

Total Contributions of Biquartimin Primary Factors 


Factor 

T 

t 2 


t 4 

T 

3.070 


_ 

_ 

t 2 

.172 

2.039 

— 

— 

t 3 

.581 

.251 

2.465 

— 

t 4 

.267 

.340 

.364 

1.839 


Grand total = 11.388 


have a greater variation in their direct contributions than those obtained sub¬ 
jectively by passing the primary factors through the points representing composite 
variables. In other words, there is a greater tendency toward level contributions of 
factors determined by intuitive-graphical methods. 

Since the calculation of an analytical oblique solution for a problem of twenty-four 
variables and four factors can be obtained in a matter of minutes on an electronic 
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com puter , several additional solutions to the one in Table 15.5 were determined. 
Without presenting the detailed tables, certain interesting properties will be noted. 
As an empirical test of convergence, two biquartimin solutions were obtained once 
starting with the centroid solution and then again starting with the varimax solution 
(Table 14.6) The results were almost identical, with a maximum difference for any 
element of the final structure matrix V being less than .01. 

Two other oblimin solutions were computed, for y = 0 and y = 1 , each time 
starting with the same centroid solution. Of course these solutions turned out to be 
quite different from the biquartimin solution of Table 15.5 since they were designed 
to satisfy different criteria. One striking difference was that the quartimin (y = 0) 
so ution contained forty-one small negative entries while the covarimin (y = 1 
so ution did not contain a single negative entry. While it would require too much 
space to show these solutions, some of the statistics associated with them are pre- 
sente m Table 15.7. Either of the transformation matrices given in this table may 
e applied to the centroid pattern to get a corresponding oblimin reference structure 
./ + US 5 lg From the correlations among the primary factors, it will be noted 

hat the quartimin factors are most highly correlated, the covarimin factors are 
least correlated, while the biquartimin factors are between these extremes. It was 
empincal evidence of this type, coupled with its own rationale, that led Carroll to 
the development of the biquartimin criterion. 


Table 15.7 

Some Statistics for the Quartimin and Covarimin Solutions for Twenty-Four 
Psychological Tests 3 


Factor 

Qu 

artimin So 

lution (y = 0) 

Covarimin Solution (y 

= 1 ) 

1 

2 

3 

4 

1 

2 

3 

4 

A! 

^2 

A 3 

a 4 

T 

t 2 

t 3 

t 4 

Number of iterations 
Criterion 

Initial 

Final 


Transformation Matri 

x A 



.234 

-.588 

-.714 

.301 

.201 

-.376 

.737 

-.525 

.170 

.499 

-.530 

-.665 

.160 

.434 

.425 

.778 

.671 

-.378 

.514 

-.378 

.545 

.625 

-.386 

-.405 

.631 

-.576 

-.458 

.246 

.599 

.536 

.246 

.542 



Correlations Among Primary Factors 


1.000 

.680 

1.000 

.564 

.604 

1.000 

.653 

.655 

.724 

1.000 

1.000 

-.037 

1.000 

-.304 

-.020 

1.000 

-.080 

-.339 

-.044 

1.000 

1,218 

132.22 

4.03 

121 

-12.69 

-21.92 



} ‘ 
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The final application of the oblimin methods is to the problem of eight political 
variables for which the method of 15.2 broke down. The oblimin methods applied 
to the same difficult problem lead to satisfactory results. First, the quartimin method 
of 15.3 is applied, using a desk calculator, and after six iterations the criterion 
stabilizes at the minimum N = .1819 (from an original value of N = -4344 for the 
initial principal-factor solution). The resulting structure is the first one exl Jibited m 
Table 15.8. Alongside of this solution is a corresponding one obtained on an IBM 704, 
but in which normalized loadings were employed in the course of the calculations. 


Table 15.8 

Three Oblimin Solutions and Binormamin Solution for Eight 
Political Variables 

(Initial solution: Principal-Factor, Table 8.18) 



Quartimin (y = 0) 
Solutions* 

Covarimin 
(7 = b 
Solution! 

Biqi 

uartimin (y = 

•5) 

Var¬ 

iable 

j 

On Desk » 
Calculator! 

On Electronic 
Computer 

Solui 

tion 

Structure V 

Structure V 

Structure V 

Patte 

rn P 


A! 

^2 

A t 

^2 

Ty 

t 2 

A t 

^2 

T 

t 2 

1 

.63 

.09 

.71 

.14 

.74 

-.04 

.74 

.09 

.77 

.09 

2 

.90 

.23 

.98 

.30 

1.00 

.05 

1.00 

.23 

1.04 

.24 

3 

.64 

-.07 

.78 

-.01 

.86 

-.22 

.84 

-.06 

.87 

— .06 

4 

-.44 

.32 

-.62 

.26 

-.76 

.46 

-.71 

.32 

-.74 

.33 

5 

-.37 

-.70 

-.20 

-.69 

-.03 

-.71 

-.09 

-.70 

-.09 

-.73 

6 

.51 

-.25 

.68 

-.19 

.80 

-.39 

.76 

-.24 

.79 

-.25 

7 

.08 

.72 

-.15 

.68 

-.36 

.79 

-.29 

.71 

-.30 

.74 

8 

-.43 

.40 

-.64 

.34 

-.80 

.55 

-.75 

.39 

-.78 

.40 

r TiT 2 

- 

.63 

- 

.47 

- 

.00 


- 

.26 



Binormamin Solution 


Structure V 


Pattern P 


A, A 


.73 

1.00 

.82 

-.68 

-.13 

.74 

-.24 

-.72 


.12 

.27 

-.03 

•29 

-.70 

-.21 

.70 

.36 




.79 

1.07 

.87 

-.73 

-.13 

.79 

-.26 

-.77 


.12 

.28 

-.03 

.31 

-.74 

-.23 

.74 

.39 


-.35 


* In nrHpr to save soace only the reference structures are exmoueu. me puma.; . 

first 1 solution is give/by P = 1.29 IV while the corresponding relationship for the second solution is 

P — 1 134V. 

+ This solution is the only one not bfised on normalized, loadings. . * * j 

! Refuse this obUque vwimax solution actually resulted in an orthogonal frame of reference the A- and 
r-Les ScSe; and Z four otherwise distinct matrices-reference structure, reference pattern, primary 
structure, and primary pattern—all collapse into a single matrix involving orthogonal factors. 


Two additional oblimin solutions—the covarimin (y — 1) and the biquartimin 

( y = 5 )_for the eight political variables were also obtained on an IBM 704 and 

are shown in Table 15.8. Again, it will be noted that the quartimin factors tend to 
be highly correlated, the covarimin factors tend toward orthogonality (actua y 
so in this case), while the biquartimin factors assume some intermediate positions. 
Any one of these solutions is a better approximation to simple structure than the 
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one attempted by graphical methods (in first edition of this text). The results of 
the biquartimm solution are exhibited in Figure 15.1. Of course, after obtaining 
the results of the analytical solutions it is easy to see wherein the original intuitive 
judgments could be improved upon—and this kind of rationalization is to be expected 



nh^ °H h K Ue Sol , udon f ° r the difficult example of eight political variables was also 

the extremT right ofTable ^" 0rm 7 in f criterion ^.40). This solution is shown at 
it Hr, i ^ Table 15.8, and as far as meeting the simple structure principles 
it does about as good a job as any of the oblimin solutions, coming closest to^he 
biquartimm form. With regard to the degree of obliqueness, its primary" £ 

quartimk° re COrre ated than those of the biquartimin solution but less than the 

Whfie the entire class of oblimin solutions is infinite, depending on the value of 
parameter y, three particular forms have been singled out: the quartimin for 

covanmm for y=l, and the biquartimin for y = . 5 . In addition there is 
the bmormamm solution based upon the criterion (15.40), and a fifth major form is 
the oblimax solution. An important consideration in making a choice of form of 
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solution is the degree of correlation among the primary factors. All fire of these 
methods were employed in the analysis of the eight physical variables, the eight 
pohfca vlriables and the twenty-four psychological variable. From> 
studies the covarimin criterion was found to lean very strongly toward orthogonality, 
Z criterion was found to produce primary factors which were highly 
correlated; the biquartimin criterion led to an intermediate position but tending 
toward the lower correlations; and the binormamin criterion also led1 “, 
mediate position but somewhat higher than the biquartimin factors. The oblima 
criterion also leads to primary factors which are highly correlated like the quartimi 
factors From the viewpoint of showing least bias toward orthogonality or o lque 
ness eitherTh biquarLin or the binormamin solutions are most satisfactory As 
“ ted" the superiority of either of these solutions over the other is apt to stem 
from the inherent nature of the data rather than from a theoretical difference. 

It should be noted that the problem of analytical rotation m the orthogonal case 
is essentially resolved. Several very competent methods are available in chapter 14 
and chances are that they can be little improved upon. On the other hand analytica 
methods for I obliqul case are fundamentally differeni.and rin m adevelop- 
mental state. An important new development is presented Leima n [420] 

Since the trend is toward oblique simple structure, Schm 
became concerned with the difficulties that may arise in the psychological inter¬ 
pretation of oblique factors. They propose a method for transforming such a solu ion 
ffito an orthogonal one, still preserving simplicity but involving a larger numbe o 
orthogonal factors. Their model of an hierarchical factor solution is a natural 
tension of the bi-factor solution, including subgroups of the group factors. Un 
the bi-factor method, or Burt’s [61] group-factor method which involves the sl J cc ^ ss ^ 
grouping of variables according to their sign pattern in a centroid solution, the 

hierarchical solution proposed by Schmid and Leiman 

obtained higher-order factor solutions. These higher-order solutions are the tactoriza 
tions of the matrices of correlations among the oblique factors. If an oblique simp> e- 
structure type solution can be obtained at each level then their procedure can be us 
to recast ffie several higher-order factors into an orthogonal hierarchical factor 

pattern. 


15.5. Direct Oblimin 

As noted in 13.5, the bi-orthogonal system of coordinate axes—reference and 
nrimarv—played an important role in the development of oblique primary-factor 
solutions Now the computational capabilities are available to make obsolete this 
rather awkward approach. An important breakthrough was made by Jennnch an 
Sampson [279] when they derived an analytical procedure to go directly from an 
fnihal fo a primary-factor pattern. Their procedure also involves a parameter, 
different values of which lead to a whole class of obiimin-hke solutrom, Becauj* • 
nrimarv-factor pattern is obtained dipectly, without involving an intermediate 
reference structure, and because the method involves oblique factors and a mini- 
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mizmg criterion, it is designated “direct oblimin” in keeping with the previous 
names. Jennrich and Sampson use the term “simple loadings.” 

The general procedure for getting a simple structure solution has been to maximize 
as in (15.2) or minimize, as in (15.15) or (15.39), some function of the reference structure 
elements The ultimate objective is to make the primary-factor pattern satisfy the 
simple structure principles. Since the reference-factor structure and the primary- 
factor pattern are simply related by (13.43), it follows that the latter will look simple 
if the former does. This property has permitted the indirect approach of simplifying 

soLtfon™ 1 * StrUCtUre l ° be USed as a basis for « ettin S the desired primary-factor 

The point of departure in the Jennrich and Sampson approach is to seek a simple 
structure solution directly by minimizing a function of the primary-factor-pattern 
coefficients. Thus, in place of the criterion (15.38) that involves the structure values 
form° rreSP ° ndlnS fUnCt '° n f ° r the dlrect oblimin method may be expressed in the 


(15.45) 


m = Z Z b 

P< 9 =1 U =1 


J? 6 ? 

JP U M 




1 7 = 1 




7=1 


where P is the primary-factor-pattern matrix with elements In order to avoid 
confusion with the (indirect) oblimin methods of the preceding section, the parameter 
b 15 em P lo J ed ™ ‘he present section instead of y, which Jennrich uses. Of course, the 
original factor loadings a ]p may be normalized by rows, i.e., divided by h . and the 
ransformation to the final matrix P may be carried through in terms of the 1 extended 
vectors. The original lengths are restored at the end of the process by multiplying 

npf 7' In „T eVen *’ the d ' reCt oblimin elution is obtained by minimizing 

*( p ) m (15.45). It will be recalled from (13.26) that 


(15.46) 


P = A(T')" 


S ° an !° UntS t0 finding a transformation matrix T that will mini¬ 
mize F(A(T) ) under the side condition 

( 15 - 47 ) diag(T'T) = I. 

In the paper by Jennrich and Sampson [279], the mathematical development is 
presented for the simplest case—what might be called the “direct quartimin ” 
when d - 0 m (15.45)—and comparisons are made with the biquartimin solutions 
for two practical problems. The mathematical details actually are given for one 
elementary rotation (involving only two primary factors), from which the generaliza- 
lon of the process is readily evident. Rotations of this type are performed systematic¬ 
ally using all possible pairs of factors until F(P) converges. 

A FORTRAN IV subroutine which implements the direct oblimin procedures has 

inFORTRA n M y ,. J f ennriCh and ® am P son . ™d has been adapted by the author [202) 
m FORTRAN II for use on Philco 2000 and in FORTRAN IV for use on IBM 7044. 
In this program, the criterion for convergence is 
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where i is the iteration number and e is usually .00001. Actually, the function mini¬ 
mized in the subroutine is twice that in (15.45); the sum over the factors in the sub¬ 
routine is only restricted to p * q, and since the terms are symmetric m p and q the 
resulting sum is twice what it would be if these indices were not permitted to be inter¬ 
changed. The output of the program is an oblique factor solution satisfying the 
principles of simple structure, more or less. The solution consists of the factor pattern, 
the correlations among the factors, and the factor structure (all of these, of course, 

refer to the primary-factor solution). . , „ , 

When applying the direct oblimin criterion (15.45), it is possible for the minimum 
of F( P) to approach — oo if positive values of 3 are employed. Specifically, Jennrich 
has demonstrated (in a private communication to the author) that F( P) approaches 
- oo if and only if <5 > f. For practical purposes, it is recommended that the value ot 
<5 in (15 45) be zero or negative (to be distinguished from the range of zero to one 
for y in the indirect oblimin methods). When (5 is zero, the factors are most oblique 
(even higher correlations among the factors can be obtained for positive fractional 
values of <5). For negative values of 3 the factors become less oblique as 3 gets smaller. 

The last mentioned property is immediately evident from the three direct oblimin 
solutions shown in Table 15.9—the correlation between the two factors goes down 
as the value of S decreases. Other properties may also be determined from a study 
of this table, and from supplementary data obtained by running this problem tor 
values of <5 from -100 to +1. First, of course, it is of interest to see how the various 
direct solutions compare with the oblimin solutions of the preceding section. For 


Table 15.9 

Three Direct Oblimin Factor Patterns for Eight Physical Variables 


(Initial solution: Minres, Table 9.3) 


Variable 

S = o 

8 = -.5 

8 = 

-70 

j 


t 2 


t 2 

T\ 

t 2 

1 

2 

3 

4 

5 

6 

7 

8 

Criterion (15.45) 
Initial value 

Final value 

Correlation between 
primary factors 

.883 

.956 

.926 

.882 

.005 

-.006 

-.065 

.104 

.065 

-.029 

-.045 

.035 

.940 

.803 

.793 

.646 

.866 

.933 

.902 

.863 

.061 

.042 

-.017 

.140 

.115 

.027 

.010 

.085 

.918 

.784 

.770 

.637 

.819 

.802 

,762 

.792 

.804 

.678 

.618 

.640 

-.411 

-.498 

-.491 

-.427 

.490 

.424 

.448 

.286 

1/ 

.( 

153 

)36 

2.228 

.968 

110.071 

108.597 

.471 

.373 

.002 
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<5 - 0, the solution is very similar to that of the quartimin, for which the reference 
structure is given in Table 15.4 and the primary factor pattern is provided in the 
answer to ex.8 chap. 15. The differences between respective factor coefficients are 
only m the third decimal place. For S = -1 to S = -4, the resulting direct oblimin 
solutions are roughly like the biquartimin, with the correlation between the primary 

actors ranging from .337 to .285 and all factor coefficients being positive and forming 
the two very distinct clusters. ® 

Experimentation was continued with different values of S for the problem of eight 

A CS f By ^ * g i tS d ° Wn t0 ~ 6-5, thC resultin § factor P atter n has 
u -Th 1 i5f 'r ° f he mitla mmres solution ( the first factor coefficients agree 
within 001 while the second factor coefficients differ by three units in the second 

e . C *™ a P aCe ' The correlatlon bet ween the primary factors is .044 as contrasted 
I nm Corre ! 7 a A tlon f the minres case - Even when the correlation is reduced 
o^°02 (for S - -70 t<> S = -100), the resulting factor pattern is not materially 
different from hat obtained for S = -6.5. Specifically, the first axis goes through 
cluster of all the points and the second is essentially at right angle to it, just as in 
the case of the minres solution. For no value of S was a solution found that could be 
said to be similar to the covanmm (reference structure given in Table 15.4) with its 
eatures of zero correlation between the primary factors and all positive factor 
coefficients The direct oblimin solutions for large negative values of S satisfy the 
condition of near zero correlation while the solution for (5 = -4 satisfies the property 
of positive factor coefficients, but no single value of 8 produced all the features of the 
covanmm solution. Before leaving this problem, a couple of positive values of S 
were tried also. For 6 = .5, the correlation between the factors reached .748 and the 
factor coefficients did not satisfy the simple structure principles nearly as well as 
or the <5 - 0 solution in Table 15.9. Finally, just as an empirical test it was found 
that for 6 = 1 the function in (15.45) did in fact become very large negatively before 
the calculations m the computer were stopped. 

The behavior of the direct oblimin method was also explored for the problem of 
twenty-four psychological tests. This was done both with an initial minres solution 
and an initial centroid solution. Because of space limitations, only one complete 
solution is presented, in Table 15.10, but conclusions will be drawn from all of the 
experimental work. As a general observation, the direct oblimin solution in this 
table is quite similar to the biquartimin solution of Table 15.5 although the two 
solutions were obtained by different methods and were based on different initial 
matrices. The correlations among the direct oblimin factors are consistently, but 
m, w Sl, f^ , v! arger tha ? thC corres P° ndin g correlations of the biquartimin solution 
15 7) ^ 5 ’ bUt n0t ^ krge aS the correlatlons amon g the Quartimin factors (Table 

As <5 increases negatively, it seems that the resulting factor pattern tends to return 
o the form of the original input and the correlations among the factors tend toward 
zero. The ultimate results did not appear, however, up to S = -95. No negative S 
produced a solution with as high correlations among the factors as the (indirect) 
quartimin. When S was permitted to be positive—experimenting with <5 from .1 to 
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Table 15.10 

Direct Oblimin Solution (<5 = 0) for Twenty-Four Psychological Tests 
(Initial solution: Minres, m = 4) 



Primary Structure: S 

Primary P 

attern: P 


Test 





Verbal 

Speed 

Deduction 

Memory 

j 

r jT t 

r jTz 

r JT 3 

r jT< 

T 

t 2 

n 

t a 

J 

353 

.331 

.731 

.344 

.008 

.113 

.680 

.035 

2 

234 

.170 

.478 

.198 

.029 

.026 

.458 

-.001 

3 

283 

.104 

.574 

.244 

.053 

-.095 

.564 

.039 

4 

.364 

.202 

.577 

.232 

.149 

.014 

.523 

-.037 

5 

793 

.345 

.365 

.347 

.760 

.107 

.009 

-.011 

6 

816 

.225 

.383 

.412 

.785 

-.069 

.024 

.103 

7 

.849 

.283 

.361 

.284 

.872 

.041 

.009 

-.096 

8 

674 

.361 

.481 

.345 

.547 

.131 

.211 

-.012 

9 

857 

.208 

.376 

.404 

.850 

-.094 

.004 

.085 

10 

289 

.829 

.048 

.308 

.118 

.856 

-.273 

.045 

11 

336 

.617 

.251 

.503 

.074 

.493 

- .045 1 

.305 

12 

.190 

.734 

.294 

.259 

-.084 

.734 

.124 

-.029 

13 

359 

.621 

.515 

.292 

.069 

.519 

.359 

-.070 

14 

.318 

.199 

Ml 

.591 

.126 

-.032 

-.095 

,588 

15 

.244 

.186 

.220 

.554 

.022 

-.029 

.005 

.554 

16 

265 

.205 

.501 

.596 

-.083 

-.069 

.357 

.518 

17 

.284 

.333 

.198 

.631 

.031 

.122 

-.086 

.606 

18 

220 

.449 

.406 

.554 

-.136 

.267 

.221 

.425 

19 

.282 

.279 

.335 

.438 

.049 

.096 

.162 

.319 

20 

523 

.252 

.529 

.449 

.304 

-.016 

.323 

.204 

21 

361 

.528 

.503 

.386 

.061 

.376 

.329 

.092 

22 

511 

.274 

.516 

.445 

.290 

.017 

.308 

.199 

23 

.545 

.377 

.627 

.425 

.279 

.125 

.433 

.096 

24 

.511 

.588 

.346 

.465 

.294 

.421 

.026 

.175 

Factor 

Correlations / 

^mong Factors 





T, 

1.000 

.316 

.432 

.414 

Criterion (15.45): 



7 ; 


1.000 

.296 

.374 

Initial value, 5.221 


t ; 



1.000 

.387 

Final value, 1.740 





_ 

1.000 






.5 in .1 increments—it was found that for S = .4 the resulting solution was most nearly 
like the quartimin. The correlations among the factors for these two solutions are 

shown in the top blocks of Table 15.11. . 

Unlike the preceding example, for the twenty-four psychological tests the direct 
oblimin solution with S = 0 does not approximate the quartimin but comes closer 
to the biquartimin. Actually, the solution for 3 = - .5 is a better approximation to 
the biquartimin than that produced by S = 0. The correlations among the factors 
for each of these solutions, as well as those of the biquartimin factors are exhibited m 
Table 15.11. The actual 96 factor coefficients for the biquartimin and each of these 
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Table 15.11 

Comparison of Direct and Indirect Oblimin Factor Correlations: 
Twenty-Four Psychological Tests, Four Factors 


(Initial solution: Centroid) 



Direct Oblimin 

Indirect Oblimin 

Factor 


t 2 

T 3 


Factor 

T\ 

t 2 

t 3 



<5 = .4 



Quartimin (y = 

= 0) 


T\ 

1.000 

.646 

.712 

.717 

T 

1.000 

.680 

.564 

.653 

t 2 


1.000 

.633 

.701 

t 2 


1.000 

.604 

.655 

t 3 



1.000 

.673 

t 3 



1.000 

.724 

T 




1.000 

T 




1.000 


<5 = 0 


Biquartimin (y 

= -5) 


T 

1.000 

.313 

.434 

.405 

T 

1.000 

.262 

.341 

.337 

t 2 


1.000 

.313 

.412 

t 2 


1.000 

.295 

.338 

t 3 



1.000 

.376 

t 3 



1.000 

.329 

T 




1.000 

T 




1.000 


<5 = -.5 


Covarimin (y = 

= 1) 


T\ 

1.000 

.270 

.379 

.335 

T 

1.000 

-.037 

-.304 

-.080 

t 2 


1.000 

.280 

.364 

t 2 


1.000 

-.020 

-.339 

t 3 



1.000 

.329 

t 3 



1.000 

-.044 

T, 




1.000 

T 




1.000 


Average Correlation: r 


<5 

r 

<5 

r 

<5 

r 

y 

r 

Type 

•5 

.841 

.2 

.443 

-.5 

.326 

0 

.647 

Quartimin 

.4 

.680 

.1 

.400 

-10 

.283 

.5 

.317 

Biquartimin 

.3 

.524 

0 

.375 

-95 

.031 

1 

-.137 

Covarimin 


direct oblimin solutions were compared, yielding mean differences of .0114 and 
.0155 for S = — .5 and <5 = 0, respectively; and for the more important coefficients 
(i.e., a jp > .300), these mean differences are .0109 and .0147, respectively. 

Further experimentation with the 24-variable problem was conducted for values 
of <5 from -10 to -95 in increments of -5. First, it should be noted that the process 
failed to converge within the limit of 100 iterations (for S = 0 it only required about 
a dozen iterations for convergence), and the criterion function (15.45) was very large 
for these values of 3 and changed very little in successive iterations. As noted before, 
the factor correlations decrease as 3 becomes larger negatively. The first instance 
where all the factor correlations are zero in the first decimal place is for 3 = - 25. 
In this sense, that solution is like the covarimin; but just as in the case of the 8-variable 
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example, it appears to be more like the original centroid pattern than the intended 
covarimin form with all positive factor coefficients and very low factor correlations. 

Direct oblimin solutions, with <5 = 0, are presented for two additional problems, 
but without any discussion or comparison with the indirect methods. These are the 
solutions for the five socio-economic variables in Table 15.12 and for the eight 
political variables in Table 15.13. 

Table 15.12 

Direct Oblimin Solution (<5 = 0) for Five Socio-Economic Variables 

(Initial solution: Minres, Table 9.2) 


Variable 

Primary Structure: S 

Primary Pattern: P 

j 

Or. 

Or 2 


t 2 

1 

.120 

.997 

-.073 

1.011 

2 

.870 

.089 

.886 

-.080 

3 

.242 

.978 

.057 

.967 

4 

.826 

.490 

.761 

.344 

5 

.982 

.082 

1.003 

-.110 

Factor 

Factor Correlations 







T, 

1.000 

.191 

Initial value, .976 

t 2 


1.000 

Final value, . 

138 


Table 15.13 

Direct Oblimin Solution (<5 = 0) for Eight Political Variables 

(Initial solution: Principal-Factor, Table 8.18) 


Variable 

Primary Structure: S 

Primary Pattern: P 

j 

Or. 

O r 2 


t 2 

1 

.735 

-.102 

.772 

All 

2 

.964 

-.024 

1.050 

.288 

3 

.886 

-.288 

.878 

-.027 

4 

-.839 

.518 

-.751 

.295 

5 

.130 

-.703 

-.086 

-.729 

6 

.266 

-.454 

.802 

-.215 

7 

-.521 

.815 

-.305 

.724 

8 

-.904 

.610 

-.792 

.374 

Factor 

Factor Correlations 







T x 

1.000 

-.297 

Initial value, .743 

t 2 


1.000 

Final value, .452 
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MULTIPLE-FACTOR SOLUTION: OBLIQUE CASE 15.5 


From the theoretical developments and the empirical investigations, certain 
tentative conclusions can be drawn regarding the direct oblimin methods. First, it 
becomes quite evident that no simple relationship exists between the direct and the 
indirect methods. That is not to say that striking agreements cannot be found 
between results obtained by the two approaches, but merely that there is not a very 
simple way of saying what value of S in the direct method will lead to an equivalent 
result for a given y in the indirect method. Secondly, there is no compelling reason 
to accept the three special instances of the indirect oblimin—quartimin (y — 0), 
biquartimin (y = .5), and covarimin (y = 1)—as superior to direct oblimin. In some 
sense, the direct oblimin is superior, not only because of its greater simplicity 
but because of the wider range of oblique solutions that are possible. However, this 
very flexibility may detract from the direct method. It may be advisable, in due course 
of time, to recommend certain “preferred” values of S that may be expected to have 
special properties, depending on the number of variables, number of factors, and 
perhaps other parameters. 
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PART IV 


FACTOR MEASUREMENTS 








16 

Measurement of Factors 


16 . 1 . Introduction 

There are two basic problems with which factor analysis is concerned. The first of 
these deals with the methods for obtaining the linear resolution of a set of variables in 
terms of hypothetical factors. Most of the preceding work has been devoted to the 
solution of this problem. The results are the several different orthogonal and oblique 
solutions. The second problem is concerned with the description of the factors in 
terms of the observed variables, and is the subject matter of the present chapter. 

While there has been much research, and new theory and computing techniques 
for determining the factor weights of the variables in terms of the factors, there has 
not been a corresponding effort directed toward the expression of the factors in terms 
of the variables. The limited work on factor measurements since the early 1940’s 
includes theoretical papers by Kestelman [314] and Heerman [221], a brief note by 
Thomson [464], a comparison with approximate methods by Baggaley and Cattell 
[19], such applications as reported by Wenger, Holzinger, and Harman [508] and 
Selvin [426], and several computer programs. 

Methods for expressing the hypothetical constructs—the factors—in terms of the 
observed variables are developed in this chapter. First, the need for “estimation” 
rather than determining factor measurements directly is discussed in 16 . 2 . This is 
followed, in 16 . 3 , by the special situation for the measurement of principal compo¬ 
nents. Then various means are considered for estimating factor measurements when 
the classical factor analysis model is employed. The best prediction, in the least-square 
sense, is that obtained by ordinary regression methods. In 16.4 the linear regression 
of any factor on the n observed variables is obtained by the usual method. This is 
followed by an approximation method, which employs composite variables, in order 
to reduce the laborious task of the complete regression method. Another approach 
is presented in 16.7 which is superior to either of the preceding because it is more 
rapid and usually gives results as accurate as the complete estimation method. In 
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effect, the “short method” replaces the observed correlations of variables by those 
reproduced (or computed) from the factor solution. 

Two other methods, which at present do not seem to be as practical as the preceding 
ones, also are given. In 16.8 a regression method is presented in which the sum of 
the squares of the unique factors is minimized. This method produces estimates of 
factors which are usually quite different from those given by any of the other methods. 
The final method for describing the factors in terms of the variables involves the 
mathematical solution of a set of equations rather than the statistical estimation by 
regression. Hence the factors themselves, instead of estimates of them, are obtained. 
Unfortunately, however, this solution is in terms of “ideal variables (not the 
observed ones) and therefore cannot be employed in a practical way. 

16.2. Direct Solution versus Estimation 

To facilitate the development of the theory and methods of this chapter, as well as 
to bring together many of the concepts of the previous chapters, a summary of the 
relevant matrix notation is presented in Table 16.1. Because it is practically impossible 
to use a separate and distinct symbol for each new concept, and to retain the partic¬ 
ular symbol every place that it occurs, the intent of the table is to call attention to 
the specific uses of the notation and to provide a ready reference source. In addition 
to the basic concepts in this table, other definitions will be made, as needed, in the 
course of the development of the various procedures. 

A set of n variables can be analyzed either (a) in terms of common factors only, 
by inserting unities in the diagonal of R; or (b) in terms of common and unique 
factors, by inserting communalities in the diagonal of R. These two approaches, of 
course, correspond to the component analysis and the classical factor analysis models, 
respectively, as first presented in 2.3. In the first instance R is a Gramian matrix, 
generally of rank n, and the factor solution 

(16.1) z = Af 

is in terms of n common factors. Since A is a square non-singular matrix, in this 
instance, it will have an inverse. Then the required factor measurements are given 
simply by: 

(16.2) f=A~ 1 z. 

This solution is determined exactly, is unique, and involves no “estimation.” 

However, when the factor model involves common and unique factors the solution 
is not so simple. Then the total number of factors exceeds the number of variables, 
and an inverse does not exist for the factor matrix M. The generally accepted proce¬ 
dure, in this case, is to resort to the “best fit” in the least squares sense. Such methods 
are developed in the ensuing sections. 

Kestelman [314] proves that even when the total number of factors exceeds the 
number of variables, exact numerical specifications can be found for the factors (but 
not unique values) such that 

FF' = I, 
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Table 16.1 

Notation for Matrices Frequently Used 


Matrix 

Order 

R 

n x n 


n x n 

R + + D 2 

n x n 

Z = (zjd 

n x N 

Z = {Zj} 

n x 1 

F = (F pi ) 

m x N 

f = {F P } 

m x 1 

u = {Uj} 

n x 1 

M = (A|D) 

n x m + n 

A 

n x m 

D 

n x n 

S 

n x m 

o 

m x m 


Definition and Use 


Matrix of observed correlations among the n variables. 

Matrix of reproduced correlations from a factor solution with 
communalities in the principal diagonal. When there can be 
no confusion, the dagger may be dropped. 

Matrix of reproduced correlations with unities in diagonal. 

Matrix of N measurements on each of the n variables. 

Column vector of the n variables. 

Matrix of N measurements on each of the m common factors. 

Column vector of the m common factors. 

Column vector of the n unique factors. 

Complete pattern matrix. 

General matrix of common-factor coefficients. Also, initial 
orthogonal solution when transformation to oblique solution 
is involved. 

Matrix of unique-factor coefficients. Also matrix for unique- 
factor portion of factor structure. 

Factor structure matrix. Only the common-factor portion is 
usually of interest; both the common and unique portions 
may be represented by (S|D). 

General matrix of correlations among a set of oblique common 


'V 

P 

s 

V 

w 


m x m 

n x m 

n x m 

n x m 
n x m 


Matrix of correlations among common factors when these are 
reference axes and have to be distinguished from the primary 

factors. . 

Primary-factor pattern matrix (in oblique factor analysis an 
initial principal-factor solution is designated A rather than P). 
Primary-factor structure matrix (in oblique factor analysis this 
is not mistaken for a general factor structure matrix). 
Reference-factor structure matrix. 

Reference-factor pattern matrix. _ 


i.e. the factor measurements are in standard form and uncorrelated. He also indicates 
that such theoretical measurements are not necessarily superior, from a statistica 
standpoint, to the correlated estimates obtained by regression methods. 

The concern with the fact that the regression methods lead to correlated estimates 
of factor measurements is taken up by Heerman [221]. He develops two procedures 
for transforming the least-squares estimates of orthogonal factors so as to produce 
uncorrelated measurements. These he calls “orthogonal approximations. He also 
derives “univocal estimates” of orthogonal factors that have the property of not 
correlating with any of the factors except those they were designed to estimate. Ot 
course, in making any of these transformations there is bound to be a loss m t e 
validity of the estimators—the regression methods determine estimates that give t e 
“best” fit according to the least-squares principle. However, m considering alternative 
advantages of uncorrelated factor measurements and lack of correlations with other 
factors, Heerman argues that “it would appear that the orthogonal estimators 
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represent something of a compromise between the maximum validity of least-squares 
estimators and the purity of the univocal estimators” [221, p. 172] 

16.3. Measurement of Principal Components 

When the analysis is in terms of principal components then the factor measure¬ 
ments can be obtained directly. Not only does the inverse of the factor matrix exist 
" " components have been obtained, so that the factor measurements are given 
by (16.2), but actually it is not necessary to calculate the inverse of the factor matrix. 
Furthermore, m the more practical situation when only a few of the larger compo¬ 
nents are used, the simplified procedure still applies. 

In the factor model (16.1) it need not be assumed that A is square, but can be of 
order n x m. Premultiplying both sides of (16.1) by A' produces 

A'z = A'Af, 

and solving for f explicitly, the final result is 

(16 * 3 > f=(A'Ar*A'z = A^A'z, 

iS l he diag ° nal matrix of the m eigenvalues retained, corresponding to 
(8.22). rrom this expression, or the equivalent algebraic form: 


a■ 

j= i A P 


(f> = 1,2, 


n k ctear that the principal components are described mathematicaily as linear 
c mbinations of the variables, with no question of statistical estimation. The coeffi- 
cients m the equation for any principal component in terms of the variables are 
obtained simply by dividing its “factor loadings” by its eigenvalue. This property 
was first derived by Hotelling [259, sec. 8] and generalized more recently by S 

To illustrate the foregoing, consider the principal component solution of Table 8 1 
for the five socio-economic variables. The necessary eigenvalues are the square roots of 

he ™" anCeS ‘ n * e second to Iast row of * a t table. Then dividing the numbers in 
the co urnns by the respective eigenvalue produces the required coefficients For 
example, the resulting equation in the case of the second component (fy is: 

F 2 = 1-08092j - .7302z, + ,9731z 3 - ,1398z 4 - . 74822 ,. 

Another important property of component analysis is that the measurements of 
any rotated components can also be obtained more simply than in the case of classical 
factor analysis. Consider a rotation of the principal component solution (say, to the 

vanmax form) by means of the transformation 
(16.5) B = AT, 

where A is the initial matrix of coefficients of the principal components, B is the 
corresponding matrix for the rotated component solution, and T is the matrix of 

7/10 
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transformation. The new solution may be represented by 
(16.6) z = Bg 

where g is used to distinguish the new column vector from the original f Then pre- 
multiplymg both sides of (16.6) by B' and solving for g produces 

( 16 - 7 ) g = (B'B^B'z. 

Again using the five socio-economic variables for illustration, the rotated varimax 
components are given m Table 14.6. The matrix B can be read from this table as 
toilows (actually, its transpose for printing convenience): 

B /.0160 .9408 .1370 .8248 .96821 

1.9938 -.0088 .9801 .4471 -.0060/ 

Simple computations produce: 


B'B = 


2.5218 

.5048 


.5048 

2.1481 


and (B'B) 1 


.4161 -.0978 

- .0978 .4885 


Then, equation (16.7) can be applied 
components, namely: 


to get the equation for each of the rotated 


G x = —.0905 z x + .3923z 2 - .0388z 3 + .2995z 4 + ,4035z 5 , 

° 2 = -48392! - .0963z 2 + .4654z 3 + .1378z 4 - .0976z 5 . 

The measurements of these (varimax) components for the sample of AT = 12 entities 
census ‘rads) were computed on a Philco 2000 computer and are presented in Table 
16.2. The rotated components are designated by M’s instead of G’s because that was 


Table 16.2 

Measurements of Varimax Components: Five Socio-Economic Variables 

(Solution of Table 14.6) 
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the notation used in chapter 14. To check any of these values by hand methods, the 
standard scores for the variables would first have to be calculated from Table 2.1 
and then substituted in the above equations. Alternatively, these equations could be 
transformed to raw score form and the values from Table 2.1 used directly. 

While the above procedure, which led to (16.7), is relatively simple, there is still 
another approach for getting the measurements of rotated components. From (16.1) 
and (16.6) it follows that 

(16.8) f = T & 

which relationship between the two sets of components corresponds to (12.7). Pre¬ 
multiplying by T', and remembering that T'T = I for an orthogonal transformation, 
this relationship becomes 

(16.9) g = T ' f > 
and, upon substituting (16.3) for f, 

(16.10) g = T , A“ 1 A , z. 

Now, the transformation matrix itself can be expressed in the form 

(16.11) T = A" 1 A'B, 

which corresponds to (12.4) and which can be obtained directly from (16.5). Finally, 
substituting (16.11) into (16.10), the formula for the rotated components becomes: 

(16.12) g = B'AA“ 2 A'z. 

Either (16.7) or this formula yields the equations for the rotated components. The 
difference between these two expressions is that the former employs the eigenvalues 
of B'B, involving the rotated matrix, while the latter uses the initial factor matrix. 
Thus, (16.7) may actually require more computations if m is of fair size because the 
matrix to be inverted is not diagonal as it is in (16.12). The results of applying formula 

(16.12) to the example of five socio-economic variables are identical to those obtained 
above; the computations are left as an exercise. 

16.4. Complete Estimation Method 

The remainder of this chapter deals primarily with the classical factor analysis 
model (2.9). In this and the following two sections, conventional regression methods 
are employed to obtain estimates of factor measurements. The linear regression of 
any factor F p on the n variables may be expressed in the form: 

(16.13) F P = jffpiZi + hi z i + • • • + PpJn (P = 1, 2, • • •, m )- 

From the general theory of multivariate regression, it follows that the normal 
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equation for determining the /i’s are 

Ppl + r l 2$ p2 + 


(16.14) 


r 2 1 pi + Pp2 + 


"h ^*1 tifipn $ip, 
"h ^2nPpn ^2p ? 


I + r n2&p2 + • ’ • + P pn = S np , 

where s jp = r ZjFp . The coefficients of the unknown /Ts are the elements of the sym¬ 
metric matrix of observed correlations. Thus, any factor can be estimated when the 
correlations of the variables with the factor and the correlations among the variables 
themselves are known. 

The formal solution of equations (16.14) can best be indicated by considering the 
matrix; 


(16.15) 


’1 P *2 p 


1 P 


* 2 p '21 


L n P 


consisting of the observed correlation matrix R bordered by the correlations of the 
variables with the particular factor. Then the unknowns in (16.14) are given by: 

(16 ' 16) ftu = — Ity/IRI 0 = 1, 2, • - •, n), 

where the fixed denominator is the determinant of the observed correlation matrix 
an l A 7pl 1S the cofactor of s jp in A. It should be noted that the determinants |A- I 
also can be expressed in terms of the cofactors of the original correlation matrix R 
lnus the values (16.6) may be written explicitly in the form 

Ppj = ( S lpl R d + S 2pl R 2jl + • • • + S„ p |R n7 .|)/|R|, 
where |R fej | is the cofactor of r kj in R. 

Substituting (16.17) into (16.13) and employing the more compact matrix designa¬ 
tions, the estimate of any one of the factors F p may be put in the form: 

(16 ’ 18) Fp = s'pK~\ (p = 1, 2, • • •, m) 

where s p is the column vector (s lp s 2p • • • s np } taken from column p of the factor 
structure S. By considering all m factors and the N measurements of each variable 
the measurements of all the factors can be expressed compactly, as follows: 

(16-1 9 ) F = SR *Z 

While the foregoing development employed the factor structure, it is possible to 
estimate the factors by use of the factor pattern instead. Substituting the expression 
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(2.43) for the structure in terms of the pattern, equation (16.19) becomes 
(16.20) F = <DA'R _1 Z, 

in which O is the matrix of correlations among the factors. When the factors are 
uncorrelated, O in (16.20) is the identity matrix, and the matrix equation for the 
prediction of such factors becomes 

21) F = A'R _1 Z (uncorrelated factors) 

which is also immediately evident from (16.19) since the pattern and structure coincide. 
If the factors are correlated, of course the distinction between a pattern and a structure 

is of paramount importance. ntm ~ 

A measure of the accuracy of estimating a factor F p by means of equation (l 6 - 13 ) 1 ® 
given by the coefficient of multiple correlation R p . Several important and useful 
formulas involving R p will be developed next. The normal equations (16.14) may be 
written in the condensed form 


(16.22) £ ( F pt - Fpfcn = 0 (j-1,2 

where the summation extends over the N observations of each variable. In the follow- 
ing development, summation on i from 1 to N will be understood although the i is 
omitted. Since the set of residuals (F p - F p ) is orthogonal to each of the n sets.of 
numbers z Jf it is orthogonal to any linear combination of these Zj. In particular, the 
set F p is such a linear combination, and hence 

V (F — F \F = 0. 


which, upon dividing through by N , reduces to 
(16.23) S F P r F p F p — S F P ’ 

since F is in standard form. The coefficient of multiple correlation of F p in terms of 
z z 2i is defined to be the simple correlation coefficient of F p and F p . The 

expression (16.23) may finally be written in the form 


(16.24) R P - t f p f p - s f p ■ 

This formula shows that the standard deviation of the factor estimates is equal to the 

coefficient of multiple correlation. . , , ., e 

A simple formula for calculating R p may be obtained by multiplying both sides of 
(16.13) by F v , summing for the N individuals, and dividing by N. The result is 


SF p r F p F p = Ppl S lp + Pp2 S 2p + • * * + Ppn S np’ 

where the symbol for the sample standard deviation (sp) should not be confused with 
the symbol for a factor structure element (s jp ). Then, according to (16.24), 


(16.25) R2 p = Ppl S ip + Pp2 S 2p + • • • + Ppn S np- 

While (16.25) is the simnlest formula for computing the multiple correlation, 
another form can be derived with other useful features. To this end, multiply the 


352 



MEASUREMENT OF FACTORS 


16.4 


first of equations (16.14) by fi pli the second by P p2 , etc., obtaining 

P p i s ip = P p i + fipi/3 p 2 ^i 2 + • • • + PpiP p „ri„, 

(16.26) . 


Ppn s np PpnPpl r nl + PpnPp2 r n2 + + /?p „. 

Adding these equations, and employing (16.25), there results 


(16.27) 


R: 


Ifi. 

j — i 


+ 2 Z PpjP P k r jk ■ 

j<k= 1 


This formula, although not so simple as (16.25) for computing R illustrates an 
important property. Any product term p pjSjp in (16.25) measures the total (direct and 
indirect ) contribution of the corresponding variable Xj to R 2 p , or the importance of 
that variable as a “determiner” of F p . The resolution of the total contribution of any 
variable into its direct and indirect effect upon F p is indicated in (16.26). Thus, while 

1 [ 1 p repre ^ ents the total portion of R 2 p which is due to X t , the right-hand member 
of the first of equations (16.26) shows that this is composed of the direct contribution 
(/? pl ) of X t and of the indirect contribution (P pl P pk r lk ) of A, through its correlations 
with each of the other variables X k (k = 2, 3, • • •, n). It may be noted that the indirect 
or joint contribution of any two variables is distributed equally between them. 

In the preceding section involving component analysis it was demonstrated that 
the principal components (or orthogonal rotations of them) are linear combinations 
of the original variables involving no estimation. One immediate implication of this 
is that the multiple correlation for any principal component must be precisely unity. 
It must also be unity for any rotation of the principal components. On the other 
hand, when the analysis is in terms of classical factor analysis then the multiple 
correlation will be less than one for each factor. The extent of this reduction can be 
seen from the following analysis. 

Suppose the factor coefficients have been estimated by the principal-factor method 
o 8.2, that is, the columns of factor loadings are appropriately normalized eigen¬ 
vectors of R - D , where R is the full correlation matrix and D 2 is the diagonal 
uniqueness matrix. Designating the eigenvalues by A p and the corresponding unit- 
length eigenvectors by a p , it follows from (8.15) that 


(16.28) 

and 


(R - D 2 )<x p 


2 p a P 


(P = 1, 2, • • •, m) 


(16.29) 


= 1. 


The column vector a p of factor coefficients is scaled according to (8.14) as follows: 

( 16 - 30 ) a p = y/Afi,. 


Now, let p p be the column vector of regression coefficients for the estimation of 
F p from z x , • • •, z„. Also let s p designate the column vector of correlations of the 
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variables with factor F p . Then, the expression (16.17) for the regression coefficients 
can be put in the following matrix form: 

(16.31) P P = R S = R “V 

where the last equality follows from the fact that the structure elements are identical 
with the pattern coefficients in the case of an orthogonal factor solution. The ex¬ 
pression (16.25) for the multiple correlation similarly can be put in matrix form, 
as follows: 

(16.32) = s pPp- 

Substituting the value for P p from (16.31), the last expression becomes 

(16.33) R p = a p R a p’ 
or, making use of (16.30), 

(16.34) R 2 P = X ( A P a p)’ 

where the last equality follows from the fact that X p is a scalar. Then, according to 
the basic relationship (16.28), the last equation can be put in the form 

Rj=a;R- x (R-D 2 )a p 

= a;(I - R~ x D 2 )a p 

= -a;R' x D 2 a p , 

and, finally 

(16.35) R 2 P = 1 - a;R -1 D 2 a p , 

by use of (16.29). This indicates the extent to which the multiple correlation is reduced 
from unity when the factor solution includes uniqueness variance. On the other 
hand, when the correlation matrix with ones in the diagonal is factored, then D - 0 
and the multiple correlation is exactly one for each principal component. 

16.5. Numerical Illustrations of Complete Estimation Method 

It is apparent that the estimation of factor measurements involves considerable 
computation. The general formula (16.19) requires the inverse of a matrix of order n, 
which can be a very laborious task as n becomes large. Hence, the complete estima¬ 
tion method is practical only with large computers. An example of such computer 
results is given, as well as a more detailed illustration of the work on a desk calculator, 

for a manageable set of data. . 

To better grasp the nature of the estimation, the hand procedures will be taken 

up first. While the determinantal method was employed to get formula (16.16) for 
the regression coefficients, in practical applications the more efficient methods o 
chapter 3 are employed for the solution of systems of linear equations. Since each 
set of normal equations (16.14) for estimating the successive common factors involves 
the same matrix of coefficients, all the factors can be estimated simultaneously. 
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Furthermore, this means that several sets of factors, obtained by different methods 
of analysis, can be predicted at the same time. In the first example, the two principal 
factors and two oblique factors are estimated for the eight physical variables. The 
complete work, by the square root method, is shown in Table 16.3. The observed 
correlations come from Table 5.3; the correlations of the variables with the principal 
factors P x and P 2 come from Table 8.12; and the correlations of the variables with 
the oblique factors T x and T 2 come from a factor structure based on a centroid 
solution which is very similar to the factor structure in Table 13.1 based on a minres 
solution. The arrangement of the work in Table 16.3 follows the schematic form of 
Table 3.2, but in place of a single dependent variable there are the four factors to be 
estimated. After the square root operation is applied, yielding the blocks below the 
known correlations, the regression coefficients are calculated by means of formulas 
of the type (3.27). Thus the last regression coefficient for each factor is given by: 

p Pi8 = .052/.723, p P28 = .116/.723, ^ ri8 = -.002/.723, /? Tz8 = .117/.723; 
the next to last is given by: 

£ Pi7 = [.033 - .088(.072)]/.666, fi Pll = [.071 - .088(.160)]/.666, 

£ Ti7 = [-.002 - .088( —.003)]/.666, /? rz7 = [.066 - .088(162)]/.666; 

and so on until the first regression coefficients are obtained. In addition to the regres¬ 
sion weights, the square of the multiple correlation for each factor estimate is obtained 
by applying formula (16.25). The results of these calculations, and the multiple 
correlations themselves, are recorded in the lower right corner of Table 16.3. 

The values of the regression coefficients in the equations predicting the oblique 
factors T x (Lankiness) and T 2 (Stockiness) as given in Table 16.3, may be written 
explicitly as follows: 

215z u + .388z 2i + .206z 3i + .158z 4{ + .042z 5i 

- .008z 6i - .003z 7 ,• - .003z 8i , 

— .043z xi + .131 z 2i - .069z 3i + .022z 4i - + .609 z 5i - 

+ .199zgj + .OlSz-ji + .162z 8i . 

The measurements of these factors for a particular individual i are obtained by 
substituting the appropriate standardized values in the above equations. The sub¬ 
script i has been included in these equations to indicate clearly that values are sub¬ 
stituted for the variables in order to get particular estimates of the factors. In general, 
however, the secondary subscript is dropped for simplicity. 

The prediction of the above factors will now be illustrated for two individuals 
whose measurements for the eight variables are given in Table 16.4. The original 
observations X Jt are changed to standardized values z Jt by applying formula (2.5). 
The means and standard deviations required for this change also are given in Table 
16.4. Upon substituting the values z jt into equations (16.36), the following results are 
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obtained: 

Factor measurements for Case 1: f n = -0.26, T 21 = 1.78; 

Factor measurements for Case 2: T l2 - 1.21, T 22 — 0.53. 

It can be seen from the eight standardized values for each girl in Table 16.4 that 
the first is of the stocky type and the second is tall or lanky. While the original values 
reveal these facts, they can be indicated more simply by the above factor measure¬ 
ments. Although the estimated factors are not in standard form, their standard 
deviations (.980 and .961) are nearly the same, and so the estimated values are 
reasonably comparable. The measurements for the first girl indicate clearly that she 
is almost two standard deviations above the mean in the factor “Stockiness” and 
slightly below the average in “Lankiness.” The second girl is a less extreme type- 

being less lanky than the other girl is stocky and also being above average in “Stocki- 
ness.” 


Table 16.4 


Means and Standard Deviations of Eight Physical Variables,* and Values for 
Two Girls 


Variable 

Mean 

Standard 

Deviation 

s j 


j 

^ j 

X 


Case 1 


; i 


1. Height 

2. Arm span 

3. Length of forearm 

4. Length of lower leg 

5. Weight 

6 . Bitrochanteric diameter 

7. Chest girth 

8 . Chest width 


63.96 in. 
64.25 in. 
17.10 in. 
19.62 in. 
119.22 lb. 
12.27 in. 
31.21 in. 
9.92 in. 


2.09 in. 
2.50 in. 
.67 in. 
.86 in. 
15.191b. 
.66 in. 
1.91 in. 
.67 in. 


63.98 in. 
63.19 in. 
16.89 in. 
19.09 in. 
149.25 lb. 
13.15 in. 
34.37 in. 
10.87 in. 


Case 2 


z j i 

*J2 

0.01 

66.34 in. 

-0.42 

66.89 in. 

-0.31 

17.99 in. 

-0.62 

20.71 in. 

1.98 

125.51b. 

1.33 

12.44 in. 

1.65 

32.52 in. 

1.42 

10.55 in. 


Z J2 


1.14 

1.06 

1.33 

1.27 

0.41 

0.26 

0.69 

0.94 


* For 305 fifteen-year old girls. 


In the foregoing illustrations standardized values were employed in the direct 
application of equations (16.36). The calculations of the standardized values for many 
variables for a large sample of individuals is laborious. The work can be greatly 
reduced by formally expressing the equations of estimation in terms of observed 
values by the use of formula (2.5). Thus, in general, an equation of the form (16 13) 
may be written as follows: ' 


(16.37) 

F , = + ^X 2 + ■■■ 

(7 2 z 

■+&--C, 

where* 



(16.38) 

C = + ^X 2 + ■ 

■■■ +^x„. 


<7i a 2 



* The notation for the mean value of a variable should not be confused with the use of a bar 
over a factor to designate its prediction by regression methods. 
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It should be noted that an estimated factor is not in standard form. Such a variable, 
however, has a mean of zero and a standard deviation which is equal to the coefficient 
of multiple correlation as shown by (16.24). 

The values of the factors estimated by equation (16.13), or (16.37), include both 
positive and negative numbers. If it is desired to eliminate the negative values, a 
transformation can be made to an arbitrary positive scale. This can be accomplished 
by standardizing the values F pi given by (16.37) and equating this variable to an 
arbitrary variable in standard form. Thus if the arbitrary variable Y is assigned a 
mean of 50 and standard deviation of 10, the required transformation can be written 
in the form 

(16.39) Y P = 10 F p/ R P + 50 ’ 

where the multiple correlation coefficient has been substituted for the standard 
deviation of the estimated factor according to (16.24). Such transformations have 
been found especially useful in psychological studies employing factor estimates 

[245,508]. _ . 

An illustration of the proportions of the variance of one of the above estimated 
factors due to the eight physical variables will now be given. The direct and indirect 
contributions of these variables on the prediction of the factor 7] = Lankiness are 
indicated in Table 16.5. Each entry in the table proper represents the total indirect 


Table 16.5 

Proportions of Variance of Computed T [ Due to the Independent Variables 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

Total 

Contribution 

Pij s ji 

j 






_ 

_ 

— 

.253 

? 

.182 

_ 

_ 

_ 

— 

— 

— 

— 

.369 

3 

.090 

.140 

— 

— 

— 

— 

— 

— 

.184 

4 

.075 

.102 

.051 

— 

— 

— 

— 

— 

.141 

5 

.011 

.013 

.007 

.006 

— 

— 

— 

— 

.020 

6 

-.002 

-.002 

-.001 

-.001 

-.001 

— 

— 

— 

-.003 

7 

-.001 

-.001 

-.000 

-.001 

-.000 

.000 

— 

— 

-.002 

8 

-.000 

-.001 

-.000 

-.000 

-.000 

.000 

.000 

— 

-.001 

Direct contribution 

.076 

.153 

.041 

.025 

.002 

.000 

.000 

.000 

R 2 = .961 

Indirect contribution 

.177 

.216 

.143 

.116 

.018 

-.003 

-.002 

. 

-.001 


contribution {2p Xj fi Xk r jk ) of variables Xj and X k . The total indirect contribution of 
any variable is equal to one-half of the sum in the row and column representing that 
variable and is given in the last row of the table. The direct contributions are 
given in the row preceding the last. The total contribution (/?i jSj X ) of each variable 
is presented in the last column of the table. Of course, the sum of the direct and 
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indirect contributions of each variable must be equal to its total contribution. 
Finally, the sum of the entries in the last column (or last two rows) is equal to the 
square of the multiple correlation. 

Since it is a well known fact that adding variables to a regression equation can only 
increase a multiple correlation, one might wonder about the small negative numbers 
in the last column of Table 16.5. No doubt these are spurious, insignificant values 
that have arisen from the rather involved mathematical computations. Because of 
the limited number of decimal figures retained in the calculations, no significance 
should be attached to the values appearing in the third decimal place as a result of a 
matrix inversion. 

One practical use of the contributions of variables to the variance of a factor is to 
measure their relative importance for predictive purposes. In building a psychological 
test, for example, many items might be analyzed by factorial methods, and then the 
question might arise about the “importance” of the different items. A simple measure 
of validity is the correlation of an item with the factor. However, a much better 
indicator is the total contribution of an item to the variance of the computed factor. 
The correlation of an item with the factor does not reflect sufficiently the indirect 
contributions of the item through its correlations with each of the other items. 
Returning to the example of eight physical variables, this point might be made by 
comparing the importance of variable 1 (Height) with variable 5 (Weight) as to their 
effect on the prediction of factor T x (Lankiness). Their correlations with the factor 
are .92 and .46, respectively; while their total contributions to the prediction of the 
factor are .25 and .02, respectively. While one would not take direct ratios of these 
pairs of numbers, nonetheless it seems that the large difference in the proportions of 
the variance of the estimated factor make the latter numbers much more meaningful 
in judging the relative importance of the two variables. 


Table 16.6 

Coefficients of Regression Equations for Prediction of 
Direct Oblimin (<5 = 0) Factors: Five Socio-Economic Variables 

(Solution of Table 15.12) 


Factor 

Coefficients of Variables i 

.. . i 

Constant 

i 

2 

3 

4 

5 


Standard Form 

T t 

-.5874 

- .0666 

.6742 

.0730 

.9136 


t 2 

1.1214 

.1407 

-.1369 

.0201 

-.0639 



Raw Score Form 

T 


mm 



.0001 

-2.2912 

t 2 


WEm 



-.0000 

-2.5289 
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As indicated above, the measurement of factors by the complete estimation method 
is feasible only with an electronic computer, except for small problems. Just as an 
illustration, such computations were carried out in an IBM 7094 for two oblique 
factors for the five socio-economic variables. The factors considered are those deter¬ 
mined by the direct oblimin method in Table 15.12. The coefficients for the estimation 
equations are given in Table 16.6, both for the variables in standard form and in raw 
score form. Also the estimated measurements of the two oblique factors for each of 
the twelve census tracts are presented in Table 16.7. It should be remembered that 
the values in this table have been estimated, according to the least-squares principle, 
while the values in Table 16.2 are exact measurements of the rotated principal 
components. 

Table 16.7 

Measurements of Direct Oblimin (5 — 0) Factors: 

Five Socio-Economic Variables 

(Solution of Table 15.12) 


Case 

T 

t 2 

1 

1.38 

-.14 

2 

-1.10 

-1.51 

3 

-1.36 

-.93 

4 

1.15 

-.63 

5 

1.09 

-.62 

6 

-.83 

.40 

7 

-.40 

-1.44 

8 

-.43 

.85 

9 

.10 

1.16 

10 

1.35 

1.10 

11 

-.72 

.89 

12 

-.22 

.88 


16 . 6 . Approximation Method 

Several approximations for estimating common factors have been proposed [194] 
to reduce the order of the matrix whose inverse is required. Such methods involve 
the grouping of certain variables into composites. The simplest procedure is as 
follows: Combine the respective subsets of variables which best measure the factors 
and employ formula (16.19) in which all the symbols now stand for the corresponding 
reduced matrices. This approximation method is best adapted to the case of many 
variables which fall into a relatively small number of distinct subgroups. In applying 
this formula to an example including, say, m subgroups, the m x m matrix of correla¬ 
tions among the composite variables is used in the place of R. Then the major portion 
of the calculations is greatly reduced. The effect of this procedure is to give all variables 
of a subgroup equal weight. 

If it is desired to give varying weights to some of the individual variables, the 
preceding method can be modified slightly. For example, if the first four variables of 
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a set are the best measures of the first common factor, they may be used individually, 
while all other variables are grouped into composites, for the estimation of this factor. 
The matrix whose inverse is required then consists of the intercorrelations of the 
first four variables and the (m - 1) composite variables. When the second, and 
successive, factors, are estimated, the corresponding variables which best measure 
them are retained, while all other variables are grouped into composites. 

To illustrate the approximation method, the eight physical variables are again 
employed. Grouping the variables as in (13.10), and using the standard deviations of 
(13.12), the composite variables may be written as follows: 

M i = / s t>! = ( z i + z 2 + ^3 + z 4 )/3.7465, 

= v 2 /s V2 = (z 5 + z 6 + z 7 + z 8 )/3.4117. 

The correlation between these composites can be calculated by means of formula 
(11.10), employing the correlations of the original variables from Table 5.3 to get 

r VlV2 = 5.686/12.782 = .4448. 

The estimate of the first oblique factor T x , for example, can be made from the 
regression equation 


T x — /? xx m x + /? X 2 m 2 . 

The normal equations (16.14) in this case are 

P u + -4448/? x 2 = .977, 

.4448/? x x + p 12 = .455, 

where the correlations of the composite variables with T x are taken from the reduced 
structure. Solving these equations for the /?’s and substituting the results in the 
regression equation, produces 


T x = ,9656 m x + .0255m 2 . 

This equation cannot be used for estimating individual values of T x unless the values 
of the composite variables are computed. Instead, by substituting the expressions for 
m x and u 2 , this equation becomes 

f x = .258(z x + z 2 + z 3 + z 4 ) + .007(z 5 + z 6 + z 7 + z 8 ), 

illustrating the fact that all variables of a subgroup are given equal weight by the 
approximation method. Forjhe two girls, considered before, the values of the first 
factor are T xx = —.30 and T l2 = 1.25. The discrepancies between these values and 
those previously obtained are due to the grouping of the variables. While the preced¬ 
ing illustration makes possible direct comparison with the complete estimation 
method, one would not ordinarily use the approximation method when only eight 
variables are involved. 
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16.7. Short Method 

While the method of the preceding section involves much less work than the 
complete estimation method, it is a somewhat crude approximation only. In the 
attempt to more nearly replicate the results of complete regression but with reduced 
labor, Ledermann [337] developed a “shortened method” for estimating factors by 
regression, and Harman [197] generalized the method to the case of oblique factors. 
To the extent that the factor solution is adequate (i.e., the residuals vanish), the 
results of the shortened method will approach those of the complete estimation 
method. In effect, the method to be developed in this section is a complete regression 
method applied to reproduced correlations rather than observed correlations. By 
making the assumption that the reproduced correlations are equal to the observed 
correlations, there derives the great advantage of replacing the nth order matrix of 
correlations by a matrix of order m (the number of common factors). Since the number 
of such factors usually is small by comparison to the number of variables, the labor 
of computing the inverse matrix is greatly reduced. 

Stated explicitly, the assumption made in this section is that 

(16.40) R + + D 2 = R, 

i.e., the reproduced correlations, with ones in the diagonal, are set equal to the 
observed correlations. To avoid unnecessarily awkward notations, R will be employed 
for the correlations resulting from a factor analysis. Then, according to (2.47), this 
matrix can be expressed as follows: 

€> 0\/A'\ /A' 

= (A<D D) 

O l/\D/ \t) 

This relation will now be used in simplifying formula (16.20) for the estimation of 
the m common factors, permitted to be oblique in order to cover the most general 
case. Premultiplying both sides of (16.41) by A'D -2 there arises* 

(16.42) A'D _2 R = A'D 2 (A<I>A' + D 2 ) = (A'D 2 A<I> + I)A'. 

Then, defining the following m x m matrix, 

(16.43) K = A'D _2 A<D, 
the expression (16.42) can be put in the form: 

(16.44) (I + K)A' = A'D 2 R 

Now, premultiplying both members of this equation by (I + K) 1 and postmultiply- 
ing by R~\ it becomes 

(16.45) A'R -1 = (I +K) _1 A'D -2 . 


(16.41) R = (A D) 


= A<DA' + D 2 . 


* Throughout this and the following section it is tacitly assumed that none of the uniquenesses 
vanishes. For an excellent treatment of the contrary case see Guttman [172], 
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Then, substituting this expression for A'R~ 1 in (16.20), there finally results 
(16.46) f=0>(I + K)- 1 A'D- 2 z, 

where only the column vectors for the factors and variables are represented instead 
of the complete matrices for the N individuals. Although this formula may appear 
to be more complex than (16.20), it is actually much simpler to apply. Aside from 
inverting the squares of the unique-factor coefficients, the only matrix whose inverse 
must be calculated is of order m. 

For the estimation of common factors by the short method, it is convenient to 
write formula (16.46) in another form. Premultiply both sides of this equation by 
[<I>(I + K) -1 ] -1 to get y 

( 16 - 47 ) (I -f K)^> _1 f = A'D“ 2 z. 

It will be observed that the resulting matrix on each side of (16.47) is of order m x 1. 
This matrix equation represents a system of m algebraic equations, obtained by 
setting the corresponding elements equal to each other. The matrices in the right- 
and member of (16.47) are quite simple, but the expression on the left appears to be 

r ?T e . r com P^ ex * ^e latter may be put in a more convenient form by substituting the 
definition of K, producing 

(16.48) (I + K)^) -1 = (I + A'D _2 A^))^)'' 1 = (<J> -1 + A'D~ 2 A). 

Finally, the system of m equations for estimating the common factors may be written 
in the matrix form: 


(16.49) 

where 

f = L _1 A'D 

(16.50) 

L = <J>~ 1 + J and 


. , Ir !ff se the common factors are uncorrelated, 0> is the identity matrix, and formula 
(16.49) reduces to 

f = (I + J) *A'D 2 z (uncorrelated factors) 

The recommended procedure in applying formula (16.51) to a numerical problem is 
as follows: Divide each element of the/ h row of A by d 2 ; this gives the matrix D 2 A 
which occurs in J and the transpose of which also occurs in the right-hand member 
of(16.51).Then multiply A by D 2 A, column by column,* to get them x m matrix J. 
Finally, add unity to each diagonal element of J to complete the determination of 
the m equations represented by (16.51). Then the solution for the common factors can 
be carried out by the square root method of 3.4. 

The procedure for estimating a set of correlated factors by means of (16.49) is 
quite similar to the preceding. After the matrix J is determined, it must be added to 

pli^ S on 0 o t fA' i by 3 D-5A a8raPh ^ ^ ^ ^ &S the conventional row-by-column multi- 
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<I> _1 instead of the identity matrix. The procedure for calculating the inverse of a 
matrix is given in 3.5. 

A compact procedure for estimating factors by the short method, employing the 
square root method with a desk calculator, is outlined in Table 16.8 and illustrated 
with a numerical example. In order to demonstrate the most general case, the work¬ 
sheet is set up to determine the regression coefficients for oblique factors. The primary- 
factor pattern in this table was obtained from a centroid solution and therefore is 
slightly different from that obtained in Table 13.1, which was derived from a minres 
solution. In Table 16.8, P is used to designate the oblique factor pattern, in place of 
the general symbol A of the formulas above. This is done to emphasize that the 
primary-factor system of oblique factors rather than the reference factors are the 
ones being estimated. 


Table 16.8 

Short Method of Estimating Factors by Regression 


A. Computing Algorithm 


Variable ■ 

Supplementary Matrices 

Variables in Basic Matrices 

T • • • T m 

T • • ■ T m 

12 . . 

T 

T m 

O 

I 

P' (Transpose of factor pattern of factors for which 
regression equations are desired) 


Square root method of Table 3.3 
is applied to compute <t>~ 1 

Uniquenesses from diagonal matrix D 2 (From initial or¬ 
thogonal solution even if P is oblique) 

Diagonal of D (Square roots of preceding numbers) 

T 

T m 


O 1 

P'D - 1 (Entries in P' divided by number in corresponding 
column of preceding line) 

J is obtai 

row m 

PD 1 

ined by row-by- 
ultiplication of 
by itself 

L = <t>~ 1 + J 

(Let L = E'E) 

C' = P'D -2 (Entries in P' divided by uniqueness for corres¬ 
ponding variable) 

Square root of matrix L 
applied to entire table 
(Operator EL -1 ) 

E 

EL - ‘C' (Square root operator applied to preceding matrix 
C) 

T 

T m 



lB' = L -1 C' (Back solution, using formulas of type (3.27), 
computing the last line first and working up) 
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Table 16.8 ( continued ) 


B. Example of Eight Physical Variables 
(Oblique Primary-Factor Solution) 


Variable 

Supplementary Matrices 

Variables in Basic Matrices 

T 

T 2 

T 

t 2 

1 

2 

3 

4 

5 

6 

7 

8 

T 

1.000 

.484 

1.000 

0 

.894 

.956 

.932 

.879 

.005 

-.025 

-.060 

.080 

t 2 

* 

1.000 

0 

1.000 

.051 

-.027 

-.052 

.029 

.930 

.825 

.769 

.685 


1.000 

.484 

1.000 

0 

.154 

.111 

.175 

.202 

.132 

.339 

.450 

.471 



.875 

-.553 

1.143 

.392 

.333 

.418 

.449 

.363 

.582 

.671 

.686 

T 



1.306 

-.632 

2.281 

2.871 

2.230 

1.958 

.014 

-.043 

-.089 

.117 

t 2 



* 

1.306 

.130 

-.081 

-.124 

.065 

2.562 

1.418 

1.146 

.999 




23.542 

-.731 

5.805 

8.613 

5.326 

4.351 

.038 

-.074 

-.133 

.170 




* 

12.218 

.331 

-.243 

-.297 

.144 

7.045 

2.434 

1.709 

1.454 




4.852 

-.151 

1.196 

1.775 

1.098 

.897 

.008 

-.015 

-.027 

.035 





3.492 

.147 

.007 

-.041 

.080 

2.018 

.696 

.488 

.418 

T 





.248 

.366 

.226 

.186 

.020 

.003 

-.001 

.011 

t 2 





.042 

.002 

-.012 

.023 

.578 

.199 

.140 

.120 


The plan of computations in Table 16.8 was developed under the premise that the 
reference structure matrix V is the key component of the oblique simple-structure 
solution but that measurements of the primary factors are desired. In this type of 
multiple-factor analysis the sequence of calculations usually is as follows (see 13.5): 

1. A, initial orthogonal solution; 

2. A, transformation matrix to reference structure; 

3. V, reference structure in rotated simple-structure solution; 

4. <I>, correlations among primary factors; 

5. D, diagonal matrix required to normalize the rows of A -1 (not to be confused 

with the matrix of unique-factor coefficients); 

6. P = VD ~ 1 , primary-factor pattern in rotated simple-structure solution (can 

also be obtained by transformation of the initial orthogonal solution, viz., 

P = A(T') -1 , where T' = DA -1 ); 

7. S = P0>, primary-factor structure in rotated simple-structure solution. 

After an oblique factor structure matrix V is obtained by the methods of chapters 
13 or 15, it is converted to a primary-factor pattern P to provide the basic data for 
Table 16.8. Then the entries in this pattern are divided by the square-roots of the 
respective uniquenesses to obtain the matrix in the right-hand portion of the table, 
while the inverse of the matrix <S> of correlations among the factors is calculated in 
the left-hand portion. Next, the matrix L is determined on the left, and the matrix C 
alongside it on the right. Application of the square root of matrix L to the entire 
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Table 16.9 

Short Method of Estimating Factors, Including Calculation of R _1 

A. Computing Algorithm 


Variable 

Supplementary Matrices 

Variables in Basic Matrices 

T l - ■ ■ T m 

• • • T„, 

12. . 

T 

T m 

O 

I 

P' (Transpose of factor pattern of factors for which 
regression equations are desired) 


Square root method of Table 3.3 
is applied to compute O ” 1 

Uniquenesses from diagonal matrix D 2 (From initial or¬ 
thogonal solution even if P is oblique) 

Diagonal of D (Square roots of preceding numbers) 

T 

T m 


O 1 

P'D ” 1 (Entries in P' divided by number in corresponding 
column of preceding line) 

J is obtained by row-by¬ 
row multiplication of 
P'D ” 1 by itself 

L = O ” 1 + J 
(Let L = E'E) 

C' = P'D ” 2 (Entries in P' divided by uniqueness for corres¬ 
ponding variable) 

Square root of matrix L 
applied to entire table 
(Operator EL” ‘) 

E 

EL” i C (Square root operator applied to preceding matrix 

C') 




Diagonal of D ” 2 (Reciprocals of uniquenesses) 

1 

n 



R ” 1 = D ” 2 - CL^C' (Compute CL _ 1 C' by column-by- 
column multiplication of EL^C' by itself, and subtract 
from D” 2 ) 

Actually inverse of (R f + D 2 ) as approximation to R” 1 

T 

T m 



S' = OP' (Column-by-column multiplication of O and P') 

T 

T,„ 



B' = S'R” 1 (Row-by-row multiplication of S' and R” ‘) 
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Table 16.9 —Continued 

B. Example of Eight Physical Variables 


(Oblique Primary-Factor Solution) 


I 

Sup 

plemer 

itary M 

itrices 

Variables in Basic Matrices 


t 2 

T 

m 

fl 

2 

— 

— 

5 

6 

7 

8 



i 

1.000 

0 

0 

1.000 


.956 

-.027 


.879 

.029 

.005 

.930 


-.060 

.769 

.080 

.685 

1 



1.000 

-.553 

0 

1.143 


.111 


.202 




.471 


.333 


.449 




.686 

T 

t 2 

| 

I 

1.306 

* 

-.632 

1.306 


2.871 

-.081 


1.958 

.065 




.117 

.999 

■ 

■ 

■ 

26.542 

* 

-.731 

12.218 


8.613 

-.243 


4.351 

.144 

■ 


■ 

■ 

1 

■ 


4.852 

-.151 

3.492 


1.775 

.007 

1.098 

-.041 



-.015 

.696 

-.027 

.488 

.035 

.418 

■ 

■ 





9.009 

5.714 

4.950 

7.576 

2.950 

2.222 

2.123 

1 

2 

3 

4 

5 

6 

7 

8 





5.042 

* 

* 

* 

* 

* 

* 

* 

-2.124 

5.858 

* 

* 

* 

* 

* 

* 

-1.307 

-1.949 

4.507 

* 

* 

* 

* 

* 

-1.085 

-1.593 

-.982 

4.139 

* 

* 

* 

* 

-.306 

-.028 

.074 

-.169 

3.504 

* 

* 

* 

-.084 

.022 

.045 

-.042 

-1.404 

2.465 

* 

* 

-.039 

.045 

.050 

-.015 

-.985 

-.340 

1.983 

* 

-.103 
-.065 
-.021 
-.065 
-.844 
-.290 
- .203 
1.947 

T 

t 2 

| 

■ 



.919 

.484 

.943 

.435 

.907 

.399 

.893 

.454 

.455 

.932 

.374 

.813 

.312 

.740 

.412 

.724 


■ 



■ 

.251 

.045 

.365 

-.003 

.229 

-.001 


.023 

.577 

.004 

.202 

B 

.013 

.121 


table yields EL 1 C' in the right-hand block. Now, from (16.49) the matrix of B- 
coefficients may be written in the form: 

(16.52) B' = L -1 C, 

in which C as defined in Table 16.8 corresponds to A'D 2 of (16 49) In order to 
get the right-hand member of (16.52) from EL _1 C' it is only necessary to premultiply 
this expression by E -1 . Without actually calculating the inverse of the triangular 
matrix E, the desired result can be accomplished by applying formulas of the type 
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The last two lines of Table 16.8B contain the ^-weights for the equations giving 
T x and T 2 in terms of the standardized variables. By comparison with equations (16.36) 
obtained by complete regression, it will be noted that the agreement is excellent, 
minor discrepancies being due to the fact that the correlations computed from the 
factor pattern are not exactly equal to the corresponding observed correlations. The 
coefficients of multiple correlation, R t = .981 and R 2 = .960, indicate that the present 
method for predicting Ti and T 2 are just as reliable as the equations (16.36), for which 
R 1 = .980 and R 2 = .961. For the two girls whose measurements on the eight 
physical variables are given in Table 16.4 the values of the factors, as estimated by 
the short method, are 

T n = -0.27, T 21 = 1.80, 

T 12 = 1.23, T 22 = 0.52, 

which are practically identical with the values previously obtained. 

It should be remembered that the method developed in this section is dependent 
upon the vanishing of the residuals resulting from the factor analysis. If the condition 
(16.40) is satisfied only approximately then the /1-weights and multiple correlations 
produced by the short method will approach the actual values as the residuals 
approach zero. Dwyer [108, p. 216] suggests that the multiple correlation resulting 
from the use of the factor solution will generally differ in absolute value from the 
actual multiple R determined from the observed correlations by an amount approx¬ 
imately equal to the average of the absolute residual error. 

In the procedure just outlined the results of the factor analysis were employed to 
obviate the calculation of R _1 required in the basic formula (16.19) of the complete 
regression method. It is also possible to obtain an approximation to R 1 via the 
factor analysis, and then to get the regression weights by this formula. The computing 
algorithm for this purpose is presented in Table 16.9A, and the procedure is illustrated 
for the example of eight physical variables in Table 16.9B. The work through the 
square root of matrix L is identical to that in Table 16.8. The remaining manipulations 
in Table 16.9 lead to an approximation of R -1 and the calculations of the ft s by 
(16.19). Since this formula calls for the correlations of the variables with the primary 
factors, such a factor structure is computed toward the bottom of Table 16.9. 

An alternative procedure, in which P and S are not obtained explicitly, is employed 
by Carroll [63] in an IBM 704 computer program (and subsequently adapted to later 
model computers) to get “factor score coefficients” by the short method. The output 
of this program must be multiplied by the matrix D to get the actual //-weights. If a 
high-speed electronic computer is available, the determination of factor measure¬ 
ments by the short method is certainly very practical and is the recommended 

procedure. . . 

While the worksheet of Table 16.9 is designed primarily for determining lactor 

measurements by the short method, there is inherent in this table a technique for 
much more general statistical application. The reference is to the potential for a 
simplified method of inverting a matrix, which operation is frequently required in 
statistical work. For example, in multiple regression analysis involving n variables, 
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the inverse of a correlation matrix of order n is required. If a factor solution (any 
factor solution) involving only m (much less than n ) common factors were obtained, 
considerable savings could be realized by employing the method of Table 16.9. 
Ordinarily an orthogonal factor solution would be employed. Then the worksheet 
would be simplified since the identity matrix I would replace <I> -1 and only the 
second square root operation would be involved. For further developments in 
the utilization of factor analysis in multiple and partial regression the reader is 
referred to Dwyer [108] and Creager [90]. 

16.8. Estimation by Minimizing Unique Factors 

An alternative to the ordinary regression method for estimating factors has been 
proposed by Bartlett [25]. Whereas in the previous methods the sum of the squares 
of discrepancies between the true and estimated factors over the range of individuals 
is minimized, now the sum of squares of the unique factors over the range of variables 
will be minimized. This method is in harmony with Bartlett’s principle that unique 
factors should be introduced only in order to explain discrepancies between observed 
values and postulated general or group factors. 

To indicate how the theory is developed, consider the case of only two common 
factors with the following factor pattern: 

(16.53) Zj = a jt F t + a j2 F 2 + djUj (j = 1, 2, • • •, n). 

The explicit expression for the unique factor of any variable Zj is 

(16.54) Uj = ( zj - a jl F l - a j2 F 2 )/dj, 

and the sum of the squares of all such factors may be denoted by the function 


(16.55) 


U(F U F 2 ) = Z U j = S (Zj - - aj2F 2 ) 2 /dj. 

j =1 i=i 


Then to minimize the sum of squares of the unique factors over the range of variables, 
it is necessary that the partial derivatives of the function U with respect to F x and F 2 
vanish, i.e., 

3U 1 

gp = ^ XI ^2 ( z j ~ a jiF 1 — cifiF^aji — 0 , 

< 1 

Jf 2 = 2 ~ a ;i F i - a j 2 F2)a j2 = 0, 

where the summations extend from j — 1 to j = n. These equations may be put in 
the form 


(16.56) 






where bars have been placed on the F’s to distinguish the estimates of the factors 
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from the true factors. This pair of simultaneous equations can be solved for the two 
unknowns, F x and F 2 , by the methods of chapter 3. 

The foregoing development can be generalized to the case of m factors. The set of 
equations (16.56), written in matrix form, then becomes: 

(16.57) Jf = A'D -2 z, 

in which the matrix J is defined in (16.50). Although formula (16.57) is not a special 
case of formula (16.49), it may be noted that if the term <I> -1 is dropped from the 
matrix L in that formula it becomes identical with (16.57). The computations required 
in the application of formula (16.57) are the same as that described for the short 
method, except that nothing is added to the matrix J. The general computing scheme 
of Table 16.8 is applicable in a simplified version, in which <J> -1 is not employed 
and J replaces L. 

The calculation of factor estimates according to formula (16.57) is presented in 
Table 16.10, both schematically and for the same numerical example employed in 

Table 16.10 


Estimation by Minimization of Unique Factors 

A. Computing Algorithm 


Variable 

Supplementary Matrices 

Basic Matrices 

T. T m 

1 2. . 

T 

T m 


P' (Transpose of factor pattern of factors for which pre¬ 
diction equations are desired) 



Uniquenesses from diagonal matrix D 2 (From initial 
orthogonal solution even if P is oblique) 

Diagonal of D (Square roots of preceding numbers) 

T 

T m 


P'D 1 (Entries in P' divided by number in corresponding 
column of preceding line) 


J (Row-by-row multiplication 
of P'D -1 by itself) 

(Let J = S'S) 

C' = P'D -2 (Entries in P' divided by uniqueness for corres¬ 
ponding variable) 


S (Square root of matrix J) 

SJ~ 1 C (Square root operator applied to preceding matrix 
C') 

T 

T m 


T' = J -1 C' (Back solution, using formulas of type (3.27), 
computing the last line first and working up) 
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the previous sections. The data through the matrix C are identical to that employed 
in the short method. The supplementary matrix to the left of C, however, is simply J 
instead of the matrix L of the preceding tables. Then the square root of matrix J 
operating on C' produces SJ -1 C. If this expression is premultiplied by S -1 there 
results the matrix of coefficients of the variables in the prediction equations for the 
factors as given by (16.57), namely: 

(16.58) 

in which the coefficients are designated by y’s to distinguish them from the conven¬ 
tional regression coefficients, and C as defined in Table 16.10 corresponds to A'D -2 
of (16.57). As before, it is not necessary to calculate the inverse of the square root 
matrix since the same effect can be accomplished by the back solution of the square 
root method outlined in Step 9 of 3.4. 

The y-coefficients of the standardized variables for estimating the common factor 
by (16.57) are given in the last two lines of Table 16.10B. For the two girls previously 
considered, estimates of the factor measurements follow: 

T tl = —0.35, T 2 1 = 2.03; 

T i2 = 1.28, T 22 = 0.56. 


Table 16.10 ( continued ) 


B. Example of Eight Physical Variables 
(Oblique Primary-Factor Solution) 


Variable 

Supplementary 

Matrices 

Basic Matrices 

T 

t 2 

1 

2 

3 

4 

5 

6 

7 

8 

T 



.894 

.956 

.932 

.879 

.005 

-.025 

-.060 

.080 

t 2 



.051 

-.027 

-.052 

.029 

.930 

.825 

.769 

.685 




.154 

.111 

.175 

.202 

.132 

.339 

.450 

.471 




.392 

.333 

.418 

.449 

.363 

.582 

.671 

.686 

T 



2.281 

2.871 

2.230 

1.958 




.117 

t 2 



.130 

-.081 

-.124 


2.562 

1.418 

1.146 

.999 


22.236 

-.099 

5.805 

8.613 

5.326 

4.351 

.038 

-.074 

-.133 

.170 


* 

10.912 

.331 

-.243 

-.297 

.144 

7.045 

2.434 

1.709 

1.454 


4.716 

-.020 

1.231 

1.826 

1.129 

.923 

.008 

-.016 

-.028 

.036 



3.302 

.108 

-.063 

-.083 

.049 

2.134 

.737 

.517 

.441 

Tt 



.261 

.387 

.239 

.196 

.004 

-.002 

-.005 

.008 

t 2 



.033 

-.019 

-.025 

.015 

.646 

.223 

.157 

.134 
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The reliability of prediction can be judged by the appropriate standard error or 
multiple correlation coefficient. It should be noted, however, that formula (16.25) 
does not yield the required multiple correlation for the present method. If formula 
(16.25) is applied, the resulting “multiple correlation” will be found to be equal to 
unity. This follows from the fact that the common factors are estimated under the 
condition that the unique factors are minimized, and the uniqueness is the standard 
error of estimate in a pattern equation. 

Bartlett [25, p. 100] proposes as a measure of the “information (reciprocal of the 
error variance)” for any factor p, the expression 

(16.59) |J|/|J pp |, 

where | J pp | is the minor of the element in row and column p of | J|. From this expres¬ 
sion, a measure of “multiple correlation” for the prediction of a factor p by the method 
discussed in this section may be put in the form: 

(16-60) R 2 p = 1 -|J PP |/|J|. 

This form corresponds to the multiple correlation for the conventional regression 
estimate of a factor p (shown for the orthogonal case by Dwyer [108, p. 229]), namely: 

(16.61) R 2 P = 1 -|L PP |/|L|, 

where L is defined in (16.50) and |L PP | is the minor of the element in row and column 
p of |L|. 

The multiple correlations for the factors estimated in Table 16.10B, as given by 
formula (16.60) are 


R x = 1 - 10.912/242.629 = .977, 

R 2 = 1 - 22.236/242.629 = .953. 

These values are practically identical to those obtained by the complete estimation 
method or the short method. For the particular example it may be concluded that, 
statistically, the different methods for estimating factors are equally good. For other 
data the method of this section may lead to radically different results than the conven¬ 
tional regression methods (compare exercises 2 or 6 with 13). The choice of the 
method of this section instead of one of the other methods must be made on the basis 
of the principle of prediction which is involved. 

The estimates of the common factors by means of formula (16.57) provide an 
alternative solution to any of the regression estimates of the preceding sections. 
Bartlett pointed out that the principle of estimation adopted in this section does not 
completely agree with the solution that has usually been employed, although the 
difference does not affect the relative weights assigned to the variables in estimating 
a single general factor. When several common factors are involved, however, the 
discrepancy between equations (16.49) and (16.57) is even more serious. Bartlett [25] 
states: “One point of view appears to have been to consider all the persons with 
different possible factorial make-ups that would give rise to the observed test scores 
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of a particular person, whereas I have regarded the test scores as a sample of all 
possible scores that might have arisen for that person according to the different 
values of specific [unique] factors he may happen to have.” 


16 . 9 . Factor Measurements by Ideal Variables 

While on the subject of factor measurements, one more procedure will be indicated. 
It is possible to get a mathematical solution for the factors in terms of the variables 
(z'j) in the common-factor space. To this end write (16.1) in the form: 

(16.62) z* = Af, 

where z* is the column vector {z" x z" 2 --- z"} of the variables projected into the common- 
factor space. The factors may be determined as linear combinations of the hypothetical 
variables (z'j) much as was done in 16.3 for the case of principal components. Pre¬ 
multiplying (16.62) by A' and designating the diagonal matrix of m eigenvalues by 

(16.63) A m = A'A 
as in ( 8 . 22 ), the explicit solution for f becomes: 

(16.64) f = A~ 1 A'z*. 

This result is very similar to (16.3) for principal components except for the fact that 
the observed variables are replaced by “ideal” variables in the present case. 

To illustrate the present method, formula (16.64) will be applied to the example 
of eight physical variables. The matrix A of coefficients of the common factors 
Ti and T 2 is the matrix P of Table 16.8. The matrix A m and its inverse follow: 


A 2 — 


3.365 -.011 

-.011 2.613 J 


A -1 — 

5 yv 2 — 


.297 .001' 
.001 .383. 


The inverse matrix may be computed by the method outlined in 3.5, or, for the simple 
case of a second-order matrix, directly from the definition of an inverse in 3 . 2 , para¬ 
graph 22. The only calculation remaining is the multiplication by A', producing 
the following expressions for the factors: 


T x = 266zj + .284z r 2 + .277z'j + 26tz% + .002z£ - .007z£ - .017z^ + .024z^, 

T 2 = .020z'i - .009z 2 - .019z'j + .OI 2 Z 4 + .356z$ + .316z£ + 294z'j + 262zl 

These equations give the descriptions of the two oblique factors (not their estimates) 
in terms of the “common-factor portions” of the original variables. The values of 
the variables zj are not known, and hence cannot be applied directly to obtain the 
measurements of the factors for the individuals. By replacing each z'j by zj, however, 
an approximation can be obtained. Denoting such approximations by double primes, 
the measurements for the two girls previously studied become: 

Case 1: Tj x = -.036, T 2X = 1.98; 

Case 2: T' X2 = 1.31, T" 22 = 0.68. 

These values are different from those obtained by the complete estimation method 
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or the short method, as should be expected. Their being of the same order of magnitude 
is due to the relatively small uniqueness, so that the ideal variables are not too far 
removed from the observed variables. Also, it is almost unfortunate that the eight 
physical variables constitute such a simple, well-structured set of data for which 
many different approaches lead to nearly equivalent results. 

* * * 

Certain consequences of the factor analysis model should be noted. The funda¬ 
mental assumption that the observed variables are linear functions of the factor 
variables implies that certain kinds of data (e.g., color of eyes, nationality) cannot 
be studied in the framework of factor analysis. A corollary condition is that all 
observed variables must be linearly related to one another. In general if factor-analysis 
methods were to be applied to qualitative or non-linear data, new equations would 
have to be derived to fit the data. Hence, the first step to take in a factor analysis is 
to make certain that the n observed variables are linearly related. Very often this is 
not done because of the implied labor. At least a visual inspection if not a formal 
statistical check should be made for each pair of observed variables. If non-linearity 
appears, then at least one of the non-linear pair is not a linear function of the factors. 

Sometimes factor analysis is applied to non-linearly related variables, provided 
their relationships are monotonic. This is done with the belief that a straight line is 
always a good approximation to a monotonic function. Under this premise, most of 
the variance will be explained by the common-factor coefficients, even though a 
portion may be lost due to the inadequate fit of a straight line to a curve. Hence, 
while the factor analysis model may not be appropriate in a strict sense, it may be 
very useful in explaining the correlations and extracting as much variance as possible 
by means of the factors. 

A purely statistical restriction is the requirement that each observed variable be 
normally distributed. While considerable latitude might be allowed, nevertheless a 
variable which is distinctly non-normal should not be included in the analysis. It 
must be remembered that the n variables are presumed to have a multivariate normal 
distribution in the mathematical developments leading to the large sample x 2 tests 
of 9.5 and 10 . 4 . In other words, the powerful statistical methods of chapters 9 and 
10 will lead to sound conclusions provided the basic assumptions are met. 

Factor analysis has made tremendous progress in recent years. Much of this modern 
development is included in the text. However, what has been presented certainly is 
not all there is to say about factor analysis. Some of the omitted areas are: (a) applica¬ 
tions of factor analysis in general; (b) specific applications toward construction of 
theories in the behavioral sciences, such as the works of Ahmavaara [4], Burt [60], 
Cattell [76], Henrysson [226], Stephenson [449], and Vernon [499]; (c) inverted factor 
techniques as expounded principally by Cattell and Stephenson; and image analysis 
of Guttman [176]. This book is intended to provide the logical foundation and to 
build the theories and computing procedures for the major methods of factor analysis. 

The new computing techniques, the objective procedures for determining simple- 
structure solutions, and the statistical tests of hypotheses are largely responsible for 
bringing modern factor analysis out of the abyss of a psychological fetish to the 
heights of a respected branch of statistical multivariate analysis. 
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PART V 


PROBLEMS AND EXERCISES 








Problems 


CHAPTER 2 

1. Why are each of the following types of factors postulated: 

(a) common, (b) unique, (c) specific, (d) error? 

2. Why are all variables and factors taken to be in standard form? 

3. Write equations (2.9) for n = 5 and m = 2. 

4. (a) What elements of these equations must be calculated in obtaining a factor 

solution? 

(b) How many of these elements are there to be determined in the case of exercise 

5 ' ^ ^ lte the contri butions of the common factors to the variance of z x in the 
following pattern equation : 

= -5Fi + .8F 2 + .33 Ui 

(b) Write the contributions of the common factors to the variance of z 2 in exercise 

6 . Obtain the total contributions of the common factors in exercise 3 . 

7. What is the communality of z x in exercise 5 ? 

8 . Why cannot the communality of a variable exceed its reliability? 

9. In the following table there are eight exercises in which two quantities are given 
and the remaining three can be determined therefrom. Complete the table. 
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Variance Component 

a 

b 

c 

d 

e 

/ 

Specificity 

.10 

.15 

.20 

.25 

05 

.10 

Error variance 

.20 

.75 



.60 

Communality 

Uniqueness 

Reliability 


.35 

.85 

.45 




h 


.30 

.90 


Exercises 10-25 are based upon the data of Table I. 


Table I 

Coefficients of Two Uncorrelated Common Factors 


Variable 


f 2 

1 

.7 

.3 

2 

.8 

0 

3 

.7 

0 

4 

.8 

.6 

5 

.6 

.5 

6 

.5 

0 

7 

.6 

.4 

8 

.7 

.6 


10. Calculate the communality of variable 8. 

11. Which variable has the highest communality? 

12. Which variable has the lowest communality? 

13. Calculate the uniqueness of variable 1. 

14. Which variable has the lowest uniqueness? 

15. Which variable has the highest uniqueness? 

16. Is a general factor present in this solution? 

17. Is a group factor present in this solution? 

18. Find the total contributions of F t and F 2 . 

19. What per cent of the total variance is attributable to each of the common factors? 

20. What per cent of the total communality is attributable to each of the common 
factors? 

21. If the reliability of variable 5 is .84, write the complete linear description of this 
variable, including the specific and error factors. 
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22. Calculate the index of completeness of factorization for variable 5 . 

23. What is the complexity of variable 1? Of variable 2? 

24. Calculate the correlations r' X2 , r' 14 , r' 25 , r 26 . 

25. Are any of the reproduced correlations equal to zero? 


CHAPTER 3 

Evaluate the determinants in exercises 1-6: 

1. 5 3 2. 3 4 3. 4 -2 

2 4 -2 6 -7 3 

4. 4 3 4 5. 3 -2 4 6 . 1 2 1 

215 85-3 012 

15 3 12 1 13 3 

Find the ranks of the matrices in exercises 7-9: 


2 

1 

9. .81 

.54 

.72 

1 

2 

C = .54 

.40 

.54 

3 

3 

.72 

.54 

.73 


10. Are the matrices B and C singular or nonsingular? 

11. Which of the matrices A, B, C is symmetric? 

12. Postmultiply the matrix A by the matrix B. What is the rank of the product 
matrix? 


13. Write the transpose of: (a) the pattern matrix of ex. 3, chap. 2; (b) the pattern 
matrix of Table I. 


14. Assume the following pattern (with uncorrelated factors): 

z-t = .5 F x + .8 F 2 + .33 U x 

z 2 = .8 F x + 3F 2 + .52 U 2 

z 3 = .6F X + .6F 2 + .53l / 3 

z 4 = .1F X + .4F 2 + .59 C 4 

z 5 = .1F X + .7 F 2 + .141/ 5 

(a) Calculate the matrix R t of reproduced correlations (to 2 decimal places) by 
means of equation (2.50). 

(b) Interpret the diagonal elements. 
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15. Assume the following factor solution: 


*1 

= .80 F t 

+ 

.60 F 2 

*2 

= .73 F x 

+ 

.68 F 2 

*3 

= .87Fj 

+ 

•49F 2 

Z 4 

= .51^ 

+ 

•86F 2 

*5 

= .64 F x 

+ 

•77F 2 


(a) Calculate R t . 

(b) Interpret the diagonal elements. 


16. For the numerical examples: 


A = 


1 2 
3 4 


and 


B = 


0 1 
2 3 


show that: 

(a) AB # BA, i.e., multiplication of matrices is not commutative; 

(b) |AB| = |A| • |B|, i.e., the determinant of the product of two square matrices is 
equal to the product of their determinants; 

(c) |AB| = |BA|, i.e., the determinant of the product of two square matrices is 
independent of the order of multiplication of the matrices. 

17. Compute the product matrices AB, BA, A'B, BA', B'A, AB', A'B', B'A' when the 
individual matrices are given by: 


"1 

2 

3" 


'6 

0 

0" 

0 

1 

4 

and B = 

7 

8 

0 

_0 

0 

1 _ 


_7 

4 

2_ 


Verify, with these particular examples, that multiplication of matrices is not 
commutative. Also verify the theorem on the transpose of products of matrices, 
viz., (AB)' = B'A', (A'B)' = B'A, etc. 

18. For the matrices of the preceding exercise, obtain the values of the determinants 
|A|, |B|, |AB|, and |BA|. 


19. Given the following two matrices: 


C = 


“2 4 6“ 
_3 1 2_ 


and 


D = 


9 1 
3 2 
2 7 


(a) Compute the product matrices CD and DC. 

(b) Are the determinants of the resulting matrices equal? 


20. Show that the product (in either order) of a matrix by its transpose is a symmetric 
matrix. 
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2 1 

01 
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1 1 

1 

\o 

0 3 
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PROBLEMS 


17. 

18. 

19. 


Calculate the direction cosines of these vectors. 


(a) Find the angle of separation of these vectors. 

(b) Determine the coefficient of correlation by use of formula (4.48). 


values in formula (2. ^ * he ““ 


Exercises 20-22 are based upon the data of Table I. 

20. Are the two columns of the pattern matrix linearly independent? 

21. What is the rank of the reproduced correlation matrix R + (determined in ex. 25, 


22. What subset of variables would give a correlation matrix of rank one? 
Exercises 23-25 are based upon the data of Table III. 

Table III 


Coefficients of Two Uncorrelated Common Factors 


Variable 


Fz 

1 

.86 

.43 

2 

.48 

.24 

3 

.70 

.35 

4 

.50 

.25 

5 

.64 

.32 

6 

.56 

.28 


23. 

24. 

25. 

26. 


Plot the points representing the six variables in the common factor space. 

(a) Are the two columns of the pattern matrix linearly independent? 

(b) Do all the second order minors vanish? 

(c) Why can these six variables be described in terms of only one common factor? 


Obtain a factor solution in terms 
factor weights: (a) from the plot of 


of only one common factor, calculating the 
exercise 23; (b) by employing formula (4.52). 


Given the following portion of an orthogonal factor pattern : 


A = .4 F x + 2F 2 + .1F 3 - .2F 4 , 


*2 = -5F, + .6F 2 + .4F 3 + .2F 4 . 


Calculate: 

(a) the correlation corrected for uniqueness; 

(b) the reproduced correlation, as the scalar product of the two vectors 
common-factor space; 


in the 


(c) 


the reproduced correlation, 
in the total factor space. 


as the cosine of the angle between the two vectors 
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CHAPTER 5 

1. In a properly designed experiment with expected relationships among the correla¬ 
tions (i.e., not arbitrary or independent), how would you use Table 5.2 to find the 
smallest number of variables required to determine: 

(a) four factors, 

(b) six factors? 

2. Why is it not desirable to employ only eleven variables to determine six factors? 
Exercises 3-6 are based upon the data of Table IV, Set A. 


Table IV 

Intercorrelations of Two Sets of Five Variables 



3. Show that the five variables can be described in terms of only one common 
factor by employing the conditions (5.9). 

4 (a) Calculate the communality of the first variable by means of equations (5.8) 

(b) Why are the numerical values of the six distinct expressions of (5.8) exactly 

equal? 

5. For a given variable, how many different (a) tetrads and (b) triads are there? 

6. Obtain the coefficients of the common factor. 

7 Assume a matrix of correlations among five variables, with unknown com- 
' munalities hj (j = 1, • • •, 5). Equations (5.9) constitute five linearly independent 
conditions that the correlations must satisfy if the five variables are to be described 
in terms of only one common factor. Equations (5.9) were obtained as a result o 
formally solving for hj. Obtain an equivalent set of equations by solving for h 2 , 
and show that they are linear combinations of equations (5.9). 

Exercises 8-13 are based upon the data of Table IV, Set B: 

8. Test the set of correlations to see if the five variables can be described m terms 

of only one common factor: 

(a) Calculate the communalities under the assumption that the rank of the 
correlation matrix is one. 
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(b) Write the factor pattern. 

(c) Calculate the matrix of reproduced correlations. 

(d) Calculate the residuals. 

9. Show that the intercorrelations of the five variables satisfy the condition (5.15) 
exactly. 

10. Calculate the communalities under the assumption that the rank of the correla¬ 
tion matrix is two. 

11. Obtain a solution in terms of two common factors. Hint: Select one of the factor 
coefficients, say, a lu arbitrarily. 

12. Check that the correlations are reproduced exactly by the solution of exercise 11. 

13. Discuss the relative merits of the solutions in exercises 8 and 11 considering the 
data (a) as exact, (b) as observations subject to fluctuations of sampling. 

14. Check that the rank of the correlation matrix in Table V can appropriately be 
assumed to be equal to two. 

Table V 

Intercorrelations of Six Physical Variables for 305 Girls 


Variable 

1 

2 

3 

4 

5 

6 

1. Height 

2. Length lower leg 

.859 






3. Sitting height 

.740 

.451 





4. Weight 

.473 

.436 

.507 




5. Chest girth 

.301 

.327 

.327 

.730 



6. Chest depth 

.201 

.227 

.211 

.611 

.484 



15. Calculate the communalities of the variables in Table V by means of formula (5.19). 

16. Calculate the communalities for the eight political variables of Table 8.17. 

17. Why are communalities put in the principal diagonal of a correlation matrix? 

18. Make an outline of various methods that may be employed in estimating com¬ 
munalities, and comment on their relative merits. 

19. The communalities of the five fictitious variables of Table IV, Set B were com¬ 
puted under the assumption of unit rank in exercise 8, and under the assumption 
of rank 2 in exercise 10. Compare these values with the “arbitrary estimates” 
given by: 

(a) the highest correlation for each variable; 
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(b) the average correlation; and 

(c) the single triad, formula (5.33). 

Also compare these values with the “complete estimates” given by: 

(d) the first centroid factor, formula (5.36). 

20. Determine the estimates of communality (a), (b), (c), and (d), as in exercise 19, 
for the six physical variables of Table V and compare with the communalities 
computed in exercise 15, under the assumption of rank 2. 

21. Compute the SMC s for the example of eight physical variables. Their inter¬ 
correlations are given in Table 5.3 and the inverse of the correlation matrix is 
computed in ex. 3, chap. 16. Compare these SMCs with the communalities 
determined in Table 5.4. 

22. Employing the approximation to R -1 for the eight physical variables shown in 
Table 16.9, calculate the SMCs for these variables. How do these results compare 
with those determined from the actual R _1 in exercise 21? 

CHAPTER 6 

1. The following questions are intended to be provocative and for general class 
discussion. 

(a) Why can a set of variables be interpreted in terms of a factorial solution 
involving correlated or uncorrelated factors? 

(b) What are some advantages in employing uncorrelated factors? 

(c) How may several workers arrive at the same factorial solution for a given 
matrix of correlations? 

(d) How can one judge the relative significance of the factors of a given solution. 

(e) Why would the uni-factor solution be the most desirable form? Why is this 
form not likely to be obtained with observed data? 

(f) Why is a more complex solution than the uni-factor type likely to furnish a 
more satisfactory description of observed data? 

(g) Why is the multiple-factor solution formulated so as not to include a general 
factor? 

(h) How may the selectivity of the sample affect the intercorrelations of a set of 
variables, and the subsequent factorial analysis? 

(i) Can a new variable be added to a given set without changing the factorial 
solution of the original portion? 

(j) When new variables are added to a set for which a factorial solution has been 
obtained, is any one of the preferred forms likely to be more “invariant” 
than the others? 

(k) How would you justify the co-existence of several preferred solutions for a 
given body of data? How would you make a choice among them? 

2. Give illustrations, other than those in the text, in which bipolar factors furnish 
more convenient interpretations than those with all positive coefficients. 
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3. (a) What happens to the factorial composition of a variable when its scale is 
reversed? 

(b) What is the effect on the reproduced correlations when all the coefficients 
of any factor are multiplied by — 1? 

CHAPTER 7 

Exercises 1-2 are based upon the correlations among the variables 5 6 7 8 9 in 
Table 7.4: ’ ’ ’ ’ 

1. Assume rank one for the matrix of correlations, and calculate a “two-factor” 
pattern by means of formula (7.10). 

2. (a) Compute the residuals. 

(b) Check the significance of these residuals, by use of Table A in Appendix 
(justifying the assumption of exercise 1). 

3. Obtain another permissible solution (with more than one common factor) for 
the correlation matrix of Table 7.2 which led to the Heywood case. 

4. Consider a set of ten variables (z u z 2 , • • •, z 10 ) grouped as follows: 

G i = (1, 2, 3,), G 2 = (4, 5, 6, 7,), G 3 = (8,9,10,). 

Use the set theory notations of (7.11) to express the following : 

(a) variable 6 is included in group 2; 

(b) the system of elements consisting of variables 1, 2, 3 and 8, 9, 10; 

(c) the sum of the first 100 values of variable 5. 

Exercises 5-13 are based upon the data of Table VI. 


Table VI 


Intercorrelations of Twelve Psychological Tests for 355 Pupils 


Variable 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 . 

Perception of brightness 













2. 

Counting dots 

.690 












3. 

Straight and curved letters 

.596 

.655 











4. 

Speed in simple code 

.515 

.557 

.600 










5. 

Verbal completion 

.421 

.397 

.386 

.255 









6. 

Understanding paragraphs 

.350 

.300 

.252 

.200 

.611 








7. 

Reading vocabulary 

.376 

.349 

.329 

.258 

.642 

.576 







8. 

General information 

.405 

.448 

.351 

.310 

.660 

.545 

.738 






9. 

Arithmetic proportions 

.342 

.381 

.284 

.241 

.407 

.428 

.435 

.478 





10 . 

Permutations-combinations 

.325 

.377 

.324 

.286 

.359 

.407 

.392 

.385 

.460 




11. 

Mechanical ability I 

.260 

.285 

.255 

.252 

.321 

.370 

.408 

.379 

.406 

384 



12. 

Mechanical ability II 

.165 

.200 

.146 

.145 

.162 

.236 

.303 

.285 

.278 

.213 

.398 



Source: K. J. Holzinger, [234, No. 9, Table 2]. 
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5. From the definition in 7 . 4 , calculate the following B-coefficients and interpret the 
results: (a) B (1, 2, 3); (b) B (1,6,12). 

6. Starting with the ^-coefficient of exercise 1(a), determine B (1, 2, 3,4) by means of 
the following steps: 

(a) Denote the sums in the calculation of B (1,2, 3) by S 3 and T 3 . 

(b) Add test 4 to the group and determine L 4 by formula (7.18). 

(c) Calculate S 4 and T 4 by means of (7.19) and (7.20). 

(d) Employing the results of (c), calculate B (1, 2, 3,4) by means of formula (7.16). 

7. Allocate the twelve tests to appropriate groups, employing the ^-coefficient 
technique as outlined in Table 7.5. 

8. Make a bi-factor pattern plan, using the groups determined in the last exercise. 

9 Write formula (7.25) for the determination of the general-factor coefficient for 
the first variable: (a) in the set theory notation, (b) in the conventional summation 
notation, indicating the limits, (c) in expanded form, indicating the individual 
correlations. 

10. Calculate the general-factor coefficients. 

11. Obtain the general-factor residuals and check the satisfactoriness of the pattern 
plan. 

12. (a) Calculate the group-factor coefficients, and write the complete factor pattern, 
(b) Determine the communalities. 

13. Obtain the final residuals and test the adequacy of the solution. 

Exercises 14—18 are based upon the data of Table VII. 


Table VII 

Correlations and Portion of Bi-Factor Solution for 
Five Physical Variables for 305 Girls 


Variable 

Correlations ( r jk ) 

General 

Factor 

Fo 

Residuals (r jk ) 


1 

2 

3 

4 

5 

1 

2 

3 

4 

5 

1. Height 

2. Arm span 

3. Length of forearm 

4. Length of lower leg 

5. Sitting height 

.846 

.805 

.859 

.740 

.881 

.826 

.497 

.801 

.494 

.451 


.691 

.591 

.581 

.598 

.674 

.438 

.404 

.446 

.274 

.538 

.473 

.099 

.454 

.102 

.048 



Source: Frances Mullen [375, p. 20]. 
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14. In a study [375] involving 17 physical variables the five variables of Table VII 
were found to belong together according to their B-coefficient, and a bi-factor 
pattern was postulated with a general physical (or growth) factoi F 0 and a group 
factor F x through these variables. Given the general-factor coefficients and the 
residuals, determine whether the original hypothesis of a single group factor for 
the five variables is warranted. Proceed as follows: 

(a) Compute the ^-coefficients of variable 5 with each of the other four variables, 
one at a time, employing the residuals. 

(b) Compute the ^-coefficients of variable 5 with all combinations of tlie other 
four variables, two at a time. 

(c) Compute the B-coefficients of variable 5 with all combinations of the other 
four variables, three at a time. 

(d) Compute B (1, 2, 3,4). 

(e) What conclusion may be drawn from the contrast of (d) and the preceding 
B-coefficients? 

15. Test the statistical significance of the general-factor residuals for variable 5, 
employing an average correlation r = .355 (based on all 17 variables in [375, 
p. 20]) and N = 305. 

16. Formulate a revised pattern plan for the five variables, based upon the findings 
in exercises 14 and 15. 

17. Determine the new group-factor coefficients. 

18. Assuming a doublet factor for variables 1 and 5, and allowing one standard error 
for chance error, why cannot the remaining variance be divided equally between 
the two variables? 


CHAPTER 8 

Exercises 1—8 are based upon the data of Table V, and employ the computing pro¬ 
cedures of 8.5. 

1. Employing the communalities which were determined in ex. 15, chap. 5, write 
the complete correlation matrix and determine the first set of trial values. 

2. Square the matrix of correlations as many times as necessary, determining a set 
of trial values at each stage, until the successive trial values agree to within 
5 units in the third decimal place. 

3. Calculate the coefficients of the first principal factor by means of 8.5, paragraph 3. 

4. Find the matrix of first-factor residuals. 

5. Determine the best set of trial values for the calculation of the second-factor 
coefficients, following the procedure of paragraph 5. 

6. Compute the coefficients of the second principal factor. 
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7. Calculate the second-factor residuals, and decide whether they may be considered 
as final residuals. 

8. Check the properties (8.23) of the principal-factor solution. 

9. Compute the principal-factor solution for the six hypothetical variables of Table 
5.5, employing the “actual” communalities of Table 5.6. What is exceptional 
about the six eigenvalues? 

10. Obtain the first two principal factors for the eight physical variables example of 
Table 5.3, employing the SMC s (determined in ex. 21, chap. 5) for the diagonal 
values of the correlation matrix. Compare the results with the solution in Table 
8.12 for which communalities were employed. 

Exercises 11—17 are based upon the data of Table V and employ the computing pro¬ 
cedures of 8 . 9 , paragraph 2 . 

11. Put the communalities which were determined in ex. 15, chap. 5 in the principal 
diagonal of the correlation matrix, calculate the sums Sj (j = 1, • • •, 6), and then 
compute the total T of these sums. 

12. Calculate the coefficients of the first centroid factor and apply the check (8.71). 

13. Calculate the products of the first-factor coefficients, and determine the first- 
factor residuals. Check that the sum of all the residuals for each variable is zero. 

14. Reflect the variables with a large number of negative signs in the residual matrix, 
following the procedure of Table 8.25. 

15. Calculate the coefficients of the second centroid factor, and apply the checks. 

16. Calculate the second-factor residuals, and decide whether they may be considered 
as final residuals. 

17. Summarize the results of the analysis, showing the complete centroid pattern 
and the degree to which the centroid factors account for the original communality 
assumed in exercise 11. 

18. Check the properties (8.61) for the foregoing centroid solution. Why are the 
variables reflected in the computation of the second and later centroid axes? 

19. Compare the centroid solution of exercise 17 with the principal-factor solution 
obtained in exs. 3, 6. Comment on the extent of agreement for this six variable 
problem and what might be expected in larger problems. 

20. Contrast the properties (8.23) of a principal-factor solution with the properties 
(8.61) of a centroid solution. 
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CHAPTER 9 

1. Starting with the objective function (9.9) for the sum of squares of residual correla¬ 
tions with a fixed variable j, derive a formula for the minres solution for the case 
of m = 1 (i.e., a single common factor). 

2. For the five hypothetical variables of Table 7.2: 

(a) Obtain a minres solution in terms of one factor, using the formulas derived 
in the preceding exercise. 

(b) Compare the minres solution (after 3 iterations) with the original Hey wood 
solution, in terms of the residuals and of the size of the objective function. 

3. For the problem of five socio-economic variables, apply the asymptotic x 2 test 
tOj determine the significance of one, two, and three factors at the .1 %, 1 %, and 
5% levels. While the illustrative example actually contains only 12 cases, and the 
approximations made in arriving at the statistic (9.31) assumed a large sample, 
nonetheless as an exercise apply the tests for N = 12, 50, 100, 200. (The necessary 
determinants were produced by a computer as follows: |R| = .0016908 for the 
original correlations; and |Rj| = .1273853, |R£| = .0023976, |R+| = .0016908 for 
the reproduced correlations for minres solutions with m = 1, 2, 3, respectively.) 

4. Obtain a minres solution, with m = 2, for the six hypothetical variables of Table 
5.5. 


CHAPTER 10 

1. For the binomial distribution 

f(X-,p) = p\\- p f-~x (* = 0,1) 

involving samples of size N, a particular set of observations X u X 2 , • ■ ■ X N 
consists of a sequence of zeros and ones. Suppose that in a sample of N = 100 
a total of 18 successes were noted. Determine the following for this case: 

(a) The likelihood function L (i.e., the joint distribution of the sample values) 

(b) The logarithm (to base e) of L. 

(c) The derivative of log L with respect to p. 

(d) The maximum-likelihood estimator p. 

Exercises 2 and 3 are based on the data of Table 7.1, and employ the computing pro¬ 
cedures of 10 . 5 . 

2. Since Spearman used these data to demonstrate the two-factor theory, it is 
reasonable to hypothesize a single common factor and to take the general-factor 
coefficients from Table 7.1 as first approximations for the calculation of maximum- 
likelihood estimates. Set up a worksheet in the form of Table 10.3, but for m= 1, 
and determine: (a) the first iteration; (b) additional iterations, as necessary, to 
obtain convergence of the factor weights. 
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3. Test the hypothesis of m = 1, proceeding as follows: 

(a) calculate the reproduced correlations and residuals, and determine the sum 

for formula (10.29); 

(b) complete the calculation of JJ x ; 

(c) determine the number of degrees of freedom; 

(d) for this number of degrees of freedom, what value of x produces a probability 
of P = .01? 

(e) how does the actual U x compare with this value? 

(f) what conclusion may be drawn regarding the hypothesis? 

4 For the five socio-economic variables the observed correlations are given in 
' Table 2.2 and the maximum-likelihood solution, under the hypothesis of m = 2 

is in Table 10.7. Test this hypothesis, organizing your work as m Tables 10.4 

and 10.5. 

Exercises 5-8 are based on the data of Table V. Although it is reasonable to assume 
two common factors for these data (see ex. 14, chap. 5), other assumptions are made 

for the sake of exercise. . . 

5 Obtain a maximum-likelihood solution under the assumption m = 1, employing 
' the coefficients of the first principal factor (ex. 3, chap. 8) as first approximations. 

6 Since the natural choice of ex. 5 for first approximations to the maximum- 
' likelihood loadings led to lengthy computations, start with the following trial 

values to see how soon convergence is obtained: 


Exercise 

1 

2 

3 

4 

, 5 

6 

a 

1.000 

.800 

.700 

.600 

.500 

.400 

b 

.900 

.700 

.600 

.500 

.400 

.300 

c 

.950 

.850 

.750 

.500 

.350 

.250 


(In order to get the full value of the convergence problem indicated m exs. 5 and 
6, without requiring the laborious calculations on the part of each student, it is 
suggested that teams be formed and the work shared.) 


7. Test the hypothesis of m = 1. 


Assume m = 2 and take as starting values the principal-factor coefficients from 
exs. 3 and 6, chap. 8, to set up a worksheet like Table 10.3 and carry through 
one comnlete iteration. 


9. If a computer and a suitable program are available, obtain a maximum-likelihood 
solution for m = 2 starting with the first two principal components rather than 
the' principal factors of the preceding exercise. 


10. Obtain maximum-likelihood estimates of factor loadings for the eight physical 
variables, whose correlation matrix is given in Table 10.2, under the assumption 
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that m = 2 and employing the principal-factor coefficients for the initial set of 
trial values. 

11. Test the hypothesis of the preceding example. 

12. If AT were 120 instead of 305 for the same correlations for the eight physical var¬ 
iables would the hypothesis of m = 2 be rejected? How about the hypothesis of 
m = 3? 

13. Suppose N were only 100 instead of 305, what conclusions could be drawn about 
the significance of 2 or 3 common factors for the eight physical variables? 

CHAPTER 11 

Exercises 1-7 are designed for practice in applying the multiple-group method, and 

are based on the data for the eight physical variables: 

1. Set up the reduced correlation matrix by employing the correlations from Table 
5.3 and the communalities from Table 5.4. Assume the previously determined 
grouping of variables: 

G x : (1, 2, 3, 4), lankiness; G 2 : (5, 6, 7, 8), stockiness. 

Determine the sums (11.1) and record them in the form of Table 11.4. 

2. Obtain the sums (11.3) of correlations between groups, as in Table 11.5, and 
calculate the standard deviations of the composite variables according to (11.7). 

3. Calculate the correlation between the two oblique factors,- 7j and T 2 , by means 
of (11.10). 

4. Determine the oblique factor structure, the elements of which are given by (11.11). 

5. Obtain the oblique factor pattern and the orthogonal factor matrix by setting up 
a worksheet like Table 11.2 and following the instructions given therein. 

6. Compute the reproduced correlations by employing the orthogonal factor matrix 
and check the results by using the oblique structure and pattern. 

7. Show the residuals. Are these residuals of the order of magnitude of final residuals 
of previous solutions? What percent of the original communality assumed in 
exercise 1 is accounted for by two multiple-group factors? 

8. Develop in detail the formula (11.5), or (11.6), for the variance of T x , assuming 
three variables in G x . 

CHAPTER 12 

1. Determine the transformation matrix T in (12.1) for the twenty-four psychological 
tests when A is the biquartimin pattern (Table 15.5) and B is the quartimax 
pattern (Table 14.3). 
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2. Employing the transformation equation (12.1), calculate b tl , b 12 , b l3 , b 14 , b 53 , 
b 24> 4 and compare the results with the original values in Table 14.3. 

3. The multiple-factor solution (Table 12.5) for the thirteen psychological tests was 
obtained by graphical methods, considering two factors at a time, as described in 
12 . 3 . Check the coefficients for z t and z 13 in the multiple-factor pattern of Table 
12.5 by applying the complete transformation matrix (12.29) to the original 
centroid pattern of Table 8.30. 

4. When a multiple-factor solution is obtained by intuitive-graphical methods it is 
not expected that a particular solution obtained by one worker could be replicated 
by another, or by the same individual at another time. As exercises in applying 
the methods of 12.3, the eight, thirteen, and twenty-four variable examples can be 
employed, with independent graphing and calculations, and checked against the 
results obtained in the text. 

5. Rotate the minres solution of ex. 4, chap. 9 to “simple structure” by the hand 
methods of 12 . 3 . Compare the results with the direct solution of Table 5.8. 

CHAPTER 13 

1. Compare the primary-factor solution (Table 13.1) for the eight physical variables 
with the multiple-group solution given in the table of ex. 5, chap. 11. 

Exercises 2-9 are designed for practice in obtaining a primary-factor solution accord¬ 
ing to the outline in 13 . 3 , and employ the thirteen psychological tests with the initial 
centroid solution of Table 8.30. 

2. Available information about these data (chap. 7) suggest the three composite 
variables: 

v l = z l + z 2 + z 3 + z 4 

= ^5 T Zg + Z7 + Zg + Z9 
V 3 ~ Z 10 + Z ll + Z 12 + Z 13 

Calculate the standard deviations of these composite variables. 

3. Express the composite variables (standardized) in terms of the factors of the 
initial solution. 

4. Determine the distances of the three composite points from the origin. 

5. Obtain the transformation matrix from the centroid factors (C p ) to the primary 
factors (7^,). 

6. Calculate the correlations among the primary factors by use of formula (13.16). 

7. Obtain the primary-factor structure by applying (13.21). 

8. Obtain the primary-factor pattern by using the computing algorithm of Table 
13.2. 
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9. Determine the contributions of the primary factors according to formulas (13 28) 
and (13.29). 

10. Develop formula (13.37) starting with (13.36). 

11. Set up a worksheet like Table 13.2 and show the detailed calculations for the 
pattern W of the oblique reference solution of Table 13.3. 

CHAPTER 14 

1. Compute the sum of fourth powers of the factor loadings for the eight physical 
variables in the graphical solution of Table 12.3, and compare with the value for 
the analytical solution of Table 14.2. 

2. (a) Using the principal-factor solution (Table 8.18) for the eight political variables 

as the initial matrix A, compute the quartimax solution. 

(b) Compare the quartimax criterion of the final solution with that of the initial 
solution. 

3. Compute the varimax criterion V for the quartimax solution for the eight physical 
variables (Table 14.2), and compare the result with the values for the graphical 
solution (Table 12.3) and the varimax solution (Table 14.5). 

4. (a) Starting with the principal-factor solution for the eight political variables 

(Table 8.18), compute the corresponding varimax solution. 

(b) Compare the varimax criterion for the initial and the final solutions. 

5. Compute the root mean square values for the differences between the following 
pairs of solutions for the eight physical variables: 

(a) Varimax and Subjective 

(b) Varimax and Quartimax 

(c) Quartimax and Subjective. 

CHAPTER 15 

1. Compute the oblimax criterion K for the centroid solution of the eight physical 
variables (initial solution A in Table 15.3). 

2. Show that the value of the quartimin criterion N for the factor structure of Table 
13.3 does not differ significantly from the value for the quartimin solution of 
Table 15.3, although the minres solution served as the starting point for the 
former while the centroid solution was used in the latter. 

3. The quartimin solution (obtained on a desk calculator) for the eight political 
variables is shown in Table 15.8. The initial matrix A is the principal-factor solu¬ 
tion of Table 8.18. Determine the final quartimin solution through the following 
procedures of 15.3: 

(a) Select the first column (i.e., x = 1) of the transformation matrix A to initiate 
the iteration process and determine the values Wj for p ^ 1. 
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(b) Employing the values from (a) for the diagonal matrix W, determine the 
matrix C according to (15.24). 

(c) Write the characteristic equation and solve for the least root N x (x = 1). 

(d) Obtain the values of A n and X 21 . 

(e) Determine the new elements v n resulting from the first iteration. 

(f) Select x = 2 for the second iteration and repeat the process (aHe), arriving at 
new elements v j2 . 

(g) Perform as many iterations as necessary, alternating between x = 1 and 
x = 2, until the value of N converges (to the minimum of N = .1819 as 
indicated in the text). Write the final transformation matrix. 

(h) Verify a few entries in the final structure matrix V by applying (15.1) with 
the transformation matrix determined in (g). 

4. Calculate the following matrices for determining the primary-factor pattern 

corresponding to the structure V in Table 15.3: 

(a) The inverse of the transformation matrix given in (15.30). 

(b) The transformation matrix T. 

(c) The diagonal matrix D. 

(d) The primary-factor pattern P. 

Also calculate: 


(e) The correlation between the two primary factors. 

Problems 5-7 are based on the first two principal components (see Table 8.1) for the 
five socio-economic variables as the initial factor matrix A, and certain results pro¬ 
duced by a computer. 


5. Get the quartimin primary-factor pattern corresponding to the reference structure 
matrix 



"-.0954 

.9857 “ 


.9359 

-.1142 

V = 

.0264 

.9585 


.7695 

.3519 


.9628 

-.1145. 

and the transformation matrix 


/ .7515 

.4758 \ 

A = 

= \ — .6597 

.8795/- 

Get the biquartimin primary-factor pattern corre 

structure matrix 

“-.0366 

.9909 “ 


.9399 

-.0669 

V = 

.0850 

.9697 


.8000 

.3954 


.9672 

-.0658 _ 
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and the transformation matrix 

A / .7893 .5196 \ 

| —.6140 .8544 /• 

7. Determine the common-factor variance of variable 1 from the primary-factor 
pattern found in ex. 5 and compare it with the value from the initial factor matrix A. 

8. The quartimin reference structure for the eight physical variables is given in 
Table 15.4. This solution, based on the minres initial matrix of Table 9.3, was 
produced by a computer which also gave the transformation matrix 

A = j .6032 .4052 \ 

I —.7976 .9142/* 

Obtain the quartimin primary-factor pattern corresponding to this reference 
structure. 

9. The reference structure V in Table 15.5 was computed on an electronic computer. 
One of the outputs of the computer program is the transformation matrix: 


.3520 

.2699 

.3684 

.2944 

-.6047 

.5822 

-.3846 

.5051 

-.6432 

-.4960 

.6839 

.3712 

.3109 

-.5849 

-.4985 

.7214 


(a) Obtain the inverse of A. First apply the square root method of 3.5 to the 
symmetric matrix 


V = A'A 

to get ¥ -1 , and then the desired result follows from (15.44). 

(b) Determine the diagonal matrix D, i.e., the normalizing factors of the rows 
of A -1 . 

(c) Determine the transformation matrix T for the primary-factor solution. 

Problems 10-13 are based on the 3-variable example employed by Jennrich and 
Sampson [279] for which they assume the following initial pattern matrix (in terms of 
orthogonal factors): 


A = 


.960 

.480 

.560 


.140“ 
.070 
.300 _ 


10. Get the value of the criterion (15.45) for the initial factor pattern for each of the 
following values of the parameter <5: 

0, -.5, -1,-5, .1, .5, .8, 1. 

11. Normalize matrix A by rows, and determine the value of the criterion function 
for <5 = 0 in this case. 
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12. Plot the three variables with reference to the orthogonal common-factor axes. 
What is special about these points? 

13. For <5 = 0, a computer produced the following factor pattern: 


j 

h n 

h j2 

1 

.97015 

.00000 

2 

.48508 

.00000 

3 

.00000 

.63530 


The correlation between the primary factors is - .80411. 

(a) What is the value of the criterion function for this solution? 

(b) Determine the primary-factor structure for this solution. 

Problems 14-15 are based on the minres solution (Table 9.2) for the five socio¬ 
economic variables as the initial factor matrix A. 

14. Compute the criterion (15.45) for A, with <5 = 0. 

15. Normalize matrix A by rows and, for <5 = 0, determine the initial value of the 
criterion function and the final value after the function has been minimized. 
Show the complete direct oblimin (<5 = 0) solution. (This exercise requires the 
use of a computer, and may be useful as a test case.) 

CHAPTER 16 

1. In 16.3 the measurements of the rotated varimax components for the five socio¬ 
economic variables were obtained by use of (16.7). The same results can be 
obtained by use of (16.12); show the computations leading to the two equations 
by means of the latter formula. 

2. From the fictitious data presented by Holzinger [239, p. 130], the following table 
is derived (with r FlF2 = .5): 



Intercorrelations 

Structure 

Pattern 

Uniqueness 

Variable 








- 

j 

1 

2 

3 


r jP2 


^2 

a j 

1 

1.00 

.81 

.87 

.90 

.60 

.8 

.2 

.16 

2 

* 

1.00 

.83 

.85 

.65 

.7 

.3 

.21 

3 

* 

* 

1.00 

.95 

.55 

.9 

.1 

.09 


Obtain the complete estimates of the two oblique factors (use the square root 
method, following the procedure of Table 16.3). Also determine the multiple 
correlations. 

3. In the complete estimation method of 16.4 the inverse of a matrix of observed 
correlations is required. For the example of eight physical variables, the square 
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root method is applied in Table 16.3 to yield the necessary regression weights. 
Employing the square root matrix S (assuming R = S'S) of Table 16.3, obtain 
R *. 

4. Employing the matrix equation (16.18), show the explicit formula for the predic¬ 
tion of T 2 in the factor analysis of the eight physical variables (use R“ 1 computed 
in ex. 3). How does the result compare with the second of equations (16.36)? 

5. Prove that the matrix L, defined in (16.50), is a symmetric matrix. 

6. Apply the short method, using the algorithm of Table 16.8, to estimate the two 
factors of the fictitious data in exercise 2. Also determine the multiple correlations. 

7. Verify the calculations of the ^-coefficients in the last two rows of Table 16.8 by 
determining E~ 1 and carrying out the multiplication E~ ^EL" 

8. The quartimin solution for the eight physical variables consists of the reference 
structure (in Table 15.3) and the primary-factor pattern which was determined in 
ex. 4, chap. 15. Starting with this factor pattern and correlation of .483 between 
the primary factors, compute the regression weights for these factors by the short 
method using the worksheet of Table 16.8. 

9. A by-product of the computer program for the oblimin solutions of 15.4 is a 
matrix of numbers which can easily be converted to the factor measurements as 
estimated by the short method. For the four biquartimin factors (Table 15.5) for 
the twenty-four psychological tests, the computer output the following “factor 
score coefficients”: 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

T 

-.042 

-.013 

-.013 

-.006 

.230 

.222 

.339 

.089 

.324 

.024 

.010 

-.036 

Ti 

.015 

-.009 

-.032 

-.011 

.056 

-.035 

.038 

.056 

-.135 

.330 

.229 

.297 


.291 

.113 

.169 

.170 

-.022 

-.012 

-.035 

.086 

-.030 

-.147 

-.069 

.044 

Pa 

-.011 

-.004 

-.020 

-.034 

-.044 

.047 

-.117 

-.046 

.125 

.059 

.124 

-.053 



13 

14 

15 

16 

17 

18 

19 

20 

21 

T 

-.009 

.019 

-.004 

-.030 

-.002 

-.053 

-.006 

.024 

-.016 

t 2 

.249 

.005 

-.026 

-.029 

.003 

.034 

.002 

-.023 

.089 

t 3 

.137 

-.051 

-.002 

.108 

-.060 

.071 

.031 

.131 

.106 

T 4 

-.111 

.188 

.185 

.155 

.317 

.234 

.118 

.049 

.040 


22 

23 

24 

.029 

.020 

.039 

-.054 

.017 

.124 

.101 

.206 

-.003 

.117 

.014 

.090 


(a) Convert these numbers to regression coefficients by premultiplying the above 
matrix by D (determined in ex. 9, chap. 15). 

(b) How could the multiple correlations be computed for the prediction of the 
four biquartimin factors? 


10. Why are the coefficients in the equation derived in exercise 4 different from the 
values in the last line of Table 16.9? 
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11. Compare the approximation to R 1 as given by (R f + D 2 ) 1 in Table 16.9 with 
the actual R -1 determined in exercise 3. 

12. Prove that the matrix of coefficients given by (16.58) when postmultiplied by the 
factor-structure matrix produces the matrix of correlations among the factors, 
viz., 

rs = o. 

13. (a) Apply the method of 16.8 to estimate the two factors of the fictitious data in 

exercise 2. 

(b) Compute the “multiple correlations” by means of formula (16.60). 

14. Verify the relationship proven in exercise 12 for the matrix of coefficients deter¬ 
mined in exercise 13. 
















Answers 


CHAPTER 2 

1. Discussion in 2 . 4 . 2. For computational convenience. 

3. z x = a lx F x + # 12^2 + d x U i, 

z 2 = 1 T @22^2 4* (I 2 U 2 , 

Z 3 = ^ 31-^1 + a 32p2 + d 3 U 3 , 

z' 4 . = a A 1 F x + a 42 F 2 + d 4 l/ 4 , 

z 's = a siFt + CI 52 F 2 d 5 U 5 . 


4. (a) The a’s and d’s; (b) 15. 5. (a) .25, .64; (b) ah, a\ 2 . 

6. aj x + ah + ah + ah + a\ x andaf 2 + af 2 + <*32 + <*42 + ^ 52 - 

7. h\ = .89. 

8. Reliability = 1 — e 2 = h 2 + b 2 ^ h 2 . 

9. The given quantities are x’d out. 


Variance Component 

Formula 

a 

b 

c 

d 

e 

/ 

g 

h 

Specificity 

b 2 — d 2 — e 2 

X 

X 

X 

-j 

1 

X 

.35 

.35 

.25 

.20 

Error variance 

e 2 = 1 — r = d 2 — b 2 

X 

.10 

.15 

.15 

X 

X 

.25 

.10 

Communality 

h 2 = 1 — d 2 

.70 

X 

.65 

.60 

X 

.55 

X 

.70 

Uniqueness 

d 2 = 1 - h 2 = b 2 + e 2 

.30 

.25 

X 

.40 

.40 

X 

.50 

X 

Reliability 

r = 1 — e 2 

.80 

.90 

.85 

X 

.95 

.90 

X 

X 


10. h\ 


.85. 


12. z 6 (since hi 


.25). 


11. z 4 (since h\ = 1.00). 
13. d\ = 42. 
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14. z 4 (since d\ = 0). 

16. Yes, F x . 

8 3 

18. £ a?, = 3.72, £ 4 = 1.22. 

j=l 

20. 75.3% and 24.7%. 

22. C 5 = 72.6%. 

24. r' 12 = .56, r i 4 = .74, r 25 = .48, r 2 6 

25. No. 


15. z 6 (since dg = .75). 

17. Yes, F 2 . 

19. 46.50% and 15.25%. 

21. z 5 = .6 F t + .5 F 2 + .48S 5 + AE 
23. 2, 1. 

= .40. 


CHAPTER 3 


1. 14. 2. 26. 

3. -2. 4. - 

-15. 

5. 139. 

6 . 0. 

7. 1. 8. 

2 . 

9. 2. 10. Singular. 

11. C. 

12 . 1 

OJ 

II 

fl u 

_«12 

a 2i a 3i a 4-l a 51 
a 22 a 32 a A2 a 52_ 





(b) 

A' = 

".7 . 
_.3 

,8 .7 .8 .6 

0 0 .6 .5 

.5 .6 .7 
0 .4 .6. 

14 (a) 


".89 

.64 .78 .67 

.91“ 




.64 

.73 .66 .68 

.77 



R f = 

.78 

.66 .72 .66 

.84 




.67 

.68 .66 .65 

.77 




_.91 

.77 .84 .77 

.98_ 


15. (a) 


"1.00 

.99 

.99 

.92 

.91' 



.99 

1.00 

.97 

.96 

.99 


R f = 

.99 

.97 1.00 

.87 

.93 



.92 

.96 

.87 

1.00 

.99 



_ .97 

.99 

.93 

.99 

1.00 

16. (a) 

AB = 

"4 

8 

is] BA = 

“ 3 
J1 

4" 

16_r 


(b) |AB| = 4,1 A( • |B| = ( —2)( —2); (c) |AB| = 4, |BA| = 4. 



ANSWERS 


'41 28 
35 24 


BA = 


6 12 18 
7 22 53 
7 18 39 


6 0 
19 8 

53 36 


0 0 
8 0 


6 0 
23 8 

21 12 


0 0 
8 0 


6 19 53 
0 8 36 

0 0 2 


'41 35 
B'A' = 28 24 


' 6 23 21 
0 8 12 
.002 

6 7 7 

12 22 18 
18 53 39 


“42 52“ 
_34 19J’ 


L 6 8 2J L18 53 39 J 

18. |A| = 1, |B| = 96, |AB| = |BA| = 96. 

19 - ( a ) CD = T 42 52 1 T 21 37 56 ~ 

L34 19J’ DC =s 12 14 22 . 

-25 15 26_ 

(b) No, since |CD| = -970 and |DC| = 0. 

20. Suppose the given matrix is A and let B = AA, then 

B' = (AA)' = (A')'(A)' = AA' = B. 

Similarly, if C = A'A, then C' = (A;A)' = A’A = C. Since the matrix B (or Q is 
equal to its transpose, the product (in either order) of any matrix by its transpose 
is symmetric according to the definition in 3.2, paragraph 14. 

21. (a) ]A| =ad- be, A -1 = f ^ /iA| ~ b /\ A \l 


d/ |A| 

-VI A|1 



L-c/IAI 

«/|A|j 



ad — be 

1 

1 


(ad — be) 2 

ad — 

be |A| 


~— 6/21 

-3/21 

12/21 

—1/21 

3/21 

12/21 

-6/21 

-3/21 

0 

0 

0 

7/21 

15/21 

-3/21 

-9/21 

-1/21 


-2 

3 • 3 • 3 • 1 1 

21 - 21 - 21-21 0 
5 


-1 4 

4 -2 

0 0 
-1 -3 


7-7-7-21 


[ — 2( —14) - 1(7) + 5( — 14)] = LiK 49 j _ 1 
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22 . 


R 

I 

1 .693 .216 

* 1 .295 

* * 1 

1 0 0 

1 0 

1 

S 

(S')- 1 

1 .693 .216 

.721 .202 

.955 

1 0 0 

-.961 1.387 0 

-.023 -.293 1.047 


R 1 


1.924 -1.326 -.024 

* 2.010 -.307 

* * 1.096 


Check: 


RR -1 = 


" 1.000 

* 

* 


.001 

1.001 


-. 000 “ 

-.000 

1 . 000 . 


23. First obtain a symmetric matrix by multiplying the given matrix by its transpose, 
namely: 


B = AA' 


10 5 10 3 

5 6 6 3 

10 6 12 3 
3 3 3 9 


Apply the method of Table 3.3 to the matrix B to obtain 



.612 

.021 

-.511 

-.041 

1 _ 

.021 

.367 

-.184 

-.068 


-.511 

-.184 

.594 

.034 


-.041 

-.068 

.034 

.136 


But B -1 = (AA') -1 = (A') -1 A -1 . Therefore premultiplying B 1 by A' yields 
A -1 , as follows: 



"-.288 

-.143 

.576 

-.048 

i _ 

.143 

.571 

-.285 

-.143 

-.001 

.000 

.001 

.333 


.713 

-.142 

-.428 

-.048 


Check: |A *| = .048. 

CHAPTER 4 

2. (a) 5; (b) 5; (c)(*)=10; (d) (*) = 10. 

3. The three points P : (3,4), 2 P : (6, 8), and 3P: (9,12) lie on a line. 


ANSWERS 


4. P( 3, —2):( — 1,1). 

5. P(2, — 3,1):(— 1,1, -8). 

6 . X 31 = — 8.5x u + 4.5x 21 , 
x 32 = —8.5 x 12 + 4.5x 22 . 

7. (a) 3; (b) Ordinary space; (c) 2, a plane. 

8. (a) 5; (b) ^/l7; (c) ^10. 

9. (a) y n = + oc i 2 x j2 , 

yj2 = a 2 l^-l + %22 x j2’ 

where ot lk cc u + a 2 k a 2l = d kl . When 

a u = cos 6 a 12 = sin 9 

a 2i = — sin6> a 22 = cost? 

it can be verified by (4.24) that the conditions for an orthogdnal transformation 
are satisfied. 

(b) y n = a xl x n + a 12 x j2 + a 13 x j3 , 

yj2 = ^21 X jl+ %22 X j2 + *22 X j2, 

y j3 = oc 31 x n + a 32 x j2 + a 33 x j3 . 

10. (a) A x = .8, i 2 = .6; (b) 1. 

11. (a) 2, = .8, X 2 = .4, 1 3 — .2, i 4 = -.4; (b) 1. 

12. (/> = 44° 41'. 13. P ■ Q = .32. 

14. (a) D(P l P 2 ) = v 7 !! = 3.6; (b) Z>(/>,/> 2 ) = ^25 = 5. 

16. = 3.74, p 2 = 4.90. 

17. = -.802, A 12 = .267, X l3 = .535; 

A 21 = -.408, i 22 = -.408, i 23 = .816. 

18. (a) 4 > 12 = 49°; (b) r 12 = cos </> 12 = .655. 

20. Yes, matrix is of rank 2. 21. 2. 22. Variables 2, 3, and 6. 

24. (a) No; (b) Yes; (c) By Theorem 4.6. 

25. (b) <31o ^ -96, a 2 Q = .54, # 3 o = .78, # 40 = .56, a 5 Q = .72, # 60 = .63. 

26. (a) r '[ 2 — .71 by formula (4.54); (b) r ' x2 = .32 by formula (4.56); 

(c) r' 12 = .32. 

CHAPTER 5 

1. (a) Read across in row m = 4 to the first positive entry, and the number n = 8 
at the head of that column is the minimum number of variables required for 
the determination of 4 factors; 

(b) n = 11. 
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PROBLEMS AND EXERCISES 


4. (a) hi = .64; 

(b) Rank of the matrix is one mathematically, not just statistically. 

5. (a) 15; (b) 6. 6. .8, .6, .7, .4, .5. 

7. There are six linear equations for the solution of hi, namely, 

^2 _ r 21 r 23 _ r 21 r 2A _ r 21 r 25 _ r 23 r 24 _ r 23 r 25 _ r 24 r 25 

2 ~ r 13 r 14 r 15 r 34 r 35 r 45 

On eliminating hj, the consistency conditions may be put in the form: 

(i) ^2 3^*14 - ?Wl3 = 0, 

(ii) r 23 r l 5 ~ r 25 r 13 = 

(iii) ^12^34 - ^24^13 = 0, 

(iv) r 12 r 35 ~ r 25 r 13 = 0, 

(V) r 23 r 45 — r 24 r 35 = 0. 

To show that these equations are linear combinations of those in (5.9), note that 
equations (i) and (ii) are equivalent to (5.9 1 ) and (5.9 2 ), respectively; equation (iii) 
is the difference of (5.9 j) and (5.9 3 ); and equation (iv) is the difference of (5.9 2 ) 
and (5.9 4 ). To show that (v) is linearly dependent on (5.9), substitute r 13 = 
r 14 r 23 /r 24 , obtained from (5.9^, into (5.9 5 ). The result is 

r 14 r 23 r 45 — r 14 r 35 r 24 = 0? 

which reduces to (v) by factoring out r 14 . 


(a) hf 

= 

.7620 


(b) 


= .87 



h 2 2 

= 

.6163 



a 21 

= .79 



H 

= 

.7289 



a 31 

= .85 



hi 

= 

.6351 



#41 

= .80 



hi 

= 

.9530 



a 51 

= .98 



(c) 









l 

2 

3 

4 




1 

2 







l 



.69 






2 

-.05 


.75 

.67 





3 

.03 

-.01 

.70 

.63 

.68 




4 

-.03 

.05 

.85 

.77 

.83 

.78 



5 

.06 

.00 

.89, . 

73, 

.72, 

.65, 

.98. 






.01 -.01 


11. One solution is the factor pattern given in ex. 14, chap. 3. 
15. .718, .494, .414, .945, .546, .363. 

406 


answers 


16. .52, 1.00, .78, .82, .36, .80, .63, .97. 

17. According to the “common-factor” model set forth in (2.9). 


19. 


Variable 

Assumed Rank 

1 

2 

1 

.76 

.89 

3 

.62 

.73 

3 

.73 

.72 

4 

.64 

.65 

5 

.95 

.98 


Arbitrary Estimate 


(a) 

(b) 

.91 

.75 

.77 

.69 

.84 

.74 

.77 

.70 

.91 

.82 


Complete 

Estimate 


(c) 


(d) 


.84 

.68 

.72 

.68 

.98 


.81 

.65 

.75 

.67 

.93 


20 . 


Variable 


Ex. 15 


(a) 


(b) 


(c) 


(d) 


1 .718 

2 .494 

3 .414 

4 .942 

5 .546 

6 .363 


.859 

.859 

.740 

.730 

.730 

.611 


.515 

.460 

.447 

.551 

.434 

.347 


1.409 

.524 

.793 

.922 

.578 

.405 


.644 

.545 

.484 

.664 

.459 

.301 


21. Employing formula (5.38), and using the diagonal elements of the matrix R 
in the answer to ex. 3, chap. 16, the SMC s are: 

.816, .849, .800, .788, .749, .605, .563, .477. 

22. Employing the approximation to R -1 , the resulting SMC s are: 

.802, .829, .778, .758, .715, .594, .496, .486. 

In every instance except variable 8, the SMC’s computed from the approximation 
to R -1 are lower than the corresponding values determined in exercise 21. The 
differences range from one to three units in the second decimal place except for 
variable 7 for which the approximate solution is .067 less than the actual solution. 

CHAPTER 6 

2. A few examples follow: friendliness—hostility; active—inactive; impulsiveness 
—restraint; demonstrative—inhibitive; dominance—submissiveness; heat- 
cold ; difficulty—ease; radicalism—conservatism. 

3. (a) Its factor weights are changed in sign; (b) No effect. 
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PROBLEMS AND EXERCISES 








ANSWERS 


11 and 13. The general-factor residuals are below the diagonal and the final residuals 
are above. 



14. (a) B( 5,1) = 107, B{ 5, 2) = 32, B( 5, 3) = 34, B( 5, 4) = 16. 

(b) B{ 5,1, 2) = 81, 5(5,1, 3) = 77, 5(5,1,4) = 78, 5(5, 2, 3) = 71, 5(5, 2,4) = 55, 
5(5, 3,4) = 54. 

(c) 5(5,1, 2, 3) = 87, 5(5,1, 2,4) = 79, 5(5,1, 3, 4) = 74, 5(5, 2, 3, 4) = 73. 

(d) 5(1, 2, 3,4) = 164. 

(e) In the set of five variables, there is no doubt that 1, 2, 3, 4 belong together, 
while variable 5 does not group with any of the others; except that variables 
5 and 1 belong together to about the same extent to which they belong with the 
other three variables. 

15. Gj. — .074 from Table A in Appendix. 

For r 51 : .274/.074 = 3.70, 5 = 1 — a = .0002 from Table C in Appendix. 

For r 52 : .099/.074 = 1.34, 5 = .1802. 

For r 53 : .102/.074 = 1.38, 5 = .1676. 

For r 54 : .048/.074 = 0.65, 5 = .5092. 

The probability 5 for the residual r 51 = .274 indicates that the observed value 
would be exceeded in less than a fraction of one per cent in sampling from a true 
value of zero; and hence it may be concluded that the true value of this residual 
is different from zero. The probabilities for the other three residuals are each 
greater than .05, the standard level of significance usually recommended, so that 
the deviations of these values from zero may be attributed to chance errors. 


Variable 

F 0 


D , 

1 

a io 

«n 

d 11 

2 

a 20 

a 2l 

— 

3 

a 30 

a 3l 

— 

4 

a 40 

«41 

— 

5 

a 50 


dsi 
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PROBLEMS AND EXERCISES 


17. a xx = .616, a 2 \ — .132, a 31 = .689, a 41 = .679. 

18. Since a f = .074, to divide the remaining variance between the two variables 
means that 

d xl = d sl = V.274 - .074 = .447. 

If this value were taken for d x l5 then the communality for the first variable would 
be: 

h\ = .691 2 + .616 2 + .447 2 = 1.057, 


an impossible value if there is to be a real unique factor. 


1 . 


CHAPTER 8 


j 

Sj 

a (1) 
a j l 

1 

3.292 

.889 

2 

2.794 

.755 

3 

2.650 

.716 

4 

3.702 

1.000 

5 

2.715 

.733 

6 

2.097 

.566 


2. Squaring the correlation matrix once, but without actually calculating the 
elements of R 4 , the trial values are found to stabilize, viz., 



3. 


j 

<Xjl 

Vji 

a Ji 


Supplementary Calculations 

1 

.898 

2.662 

.8987 

.793 

= 2.962, £ a ji = 3 - 8084 

2 

3 

.773 

.737 

2.290 

2.185 

.7731 

.7377 

.682 

.651 

^2.962/3.8084 = .8819 

4 

5 

1.000 

.732 

2.962 

2.169 

1.0000 

.7321 

.882 

.646 

a n = .8819 a,-. 

6 

.568 

1.683 

.5682 

.501 

M = 2-964 
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.089 

.318 

.224 

-.226 

-.211 

-.196 


.318 

.029 

.007 

-.166 

-.114 

-.115 

R, = 

.224 

.007 

-.010 

-.067 

-.094 

-.115 

"1 

-.226 

-.166 

-.067 

.167 

.160 

.169 


-.211 

-.114 

-.094 

.160 

.129 

.160 


-.196 

-.115 

-.115 

.169 

.160 

.112 


5. 



j 

0,2 

4 

0/2 

m 

*J2 

HI 

U j2 

°J2 

Supplementary Calculations 

l 

1.00 

.623 

1.000 

After seven 

1.000 

.508 

X 2 = 0.914, y>? 2 = 3.5368 

2 


.551 

.884 

more itera- 

Hh 

.362 


3 


.340 

.546 

tions 



.243 

V-914/3.5368 = .5084 

4 

-.60 

-.582 

-.934 




KiEI 

-.423 


5 

-.43 

-.532 

-.854 




-.750 

-.381 

a,- 2 = -5084 <Xj 2 

6 

-.62 

-.513 

-.823 




-.740 

-.376 











14 = -914 


7. 


8-Z 


*n 



-.169 

.134 

.101 

-.011 

-.017 

-.005“ 


.134 

-.102 

-.081 

-.013 

.024 

.021 

R 9 = 

.101 

-.081 

-.069 

.036 

-.001 

-.024 


-.011 

-.013 

.036 

-.012 

-.001 

.010 


-.017 

.024 

-.001 

-.001 

-.016 

.017 


_ - .005 

.021 • 

-.024 

.010 

.017 

-.029 

= 2.964 = 


a) 2 = .914 

= ^2? 


= .000. 



9 . 


Variable 

Pi 

Pz 

1 

.8600 

-.0169 

2 

.8341 

-.1559 

3 

.8646 

-.3773 

4 

.5776 

.3956 

5 

.4950 

.3391 

6 

.3300 

.2261 


The two largest eigenvalues are X x = 2.8704 
and X 2 = .4896, while the remaining four are 
zero. This property follows from the fact that 
the reduced correlation matrix (with com- 
munality) is of rank 2, mathematically. 
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10 . 


Variable 

Px 

Pi 

1 

.857 

-.317 

2 

.846 

-.398 

3 

.810 

-.402 

4 

.832 

-.335 

5 

.728 

.537 

6 

.627 

.494 

7 

.567 

.516 

8 

.607 

.360 

Contribution 



of factor 

4.410 

1.461 


This solution agrees very closely with the 
solution in Table 8.12, with corresponding 
factor coefficients differing by less than .02 
in the case of the first factor, and by less 
than .03 for the second factor. The total 
common-factor variance accounted for in 
Table 8.12 is 5.966 while the present solution 
(with minimal values of communalities) 
accounts for 5.871. 


11 and 12. 


j 

Sj 

a n 

Supplementary Calculations 

1 

3.292 

.793 

T = 17.250 

2 

2.794 

.673 

yr= 4.153 

3 

2.650 

.638 

a n = S 74.153 

4 

3.702 

.892 


5 

2.715 

.654 

D i = = 4.155 = \fr 

6 

2.097 

.505 


Total 

17.250 

4.155 



13 and 15. The first-factor residuals, x r Jk , are in the principal diagonal and below it; 
the values above the diagonal are those after reflection of variables (exercise 14) 
and are used in calculating the second-factor coefficients in exercise 15. 


Variable 

-1 

-2 

-3 

4 

5 

6 

Supplementary Calculations 

1 

.089 

.325 

.234 

.234 

.218 

.199 


2 

.325 

.041 

.022 

.164 

.113 

.113 


3 

.234 

.022 

.007 

.062 

.090 

.111 


4 

-.234 

-.164 

-.062 

.149 

.147 

.161 


5 

-.218 

-.113 

-.090 

.147 

.118 

.154 


6 

-.199 

-.113 

-.111 

.161 

.154 

.108 


Total 

-.003 

-.002 

.000 

-.003 

-.002 

.000 


s jy 

1.299 

.778 

.526 

.917 

.840 

.846 

T, = 5.206 

SjSji 

-1.299 

-.778 

-.526 

.917 

.840 

.846 

,/Ti = 2.282 

a j2 

-.569 

-.341 

-.230 

.402 

.368 

.371 

D 2 = .001, I eflji = 2.281 = ,/T 
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14. 


Variable 

Variable 

Reflected 

No. neg. 
before refl. 

No. negatives after 
reflection 



1 

2 

3 

1 

2 

3 

4 

5 

6 

— 

3 

3 

! 3 

3 

3 

3 

2 

4 

4 

2 

2 

2 

1 

1 

5 

1 

1 

1 

0 

0 

0 

0 

0 

0 

Total 

Difference 


18 

16 

10 

0 




1 

6 

10 


16. Second factor residuals, 2 r jk : 


Variable 

1 

2 

3 

4 

5 

6 

1 

-.235 






2 

.131 

-.075 





3 

.103 

-.056 

-.046 




4 

-.005 

-.027 

.030 

-.013 



5 

-.009 

.012 

-.005 

-.001 

-.017 


6 

.012 

.014 

-.026 

.012 

.017 

-.030 


7. 


Variable 

1 Comm 
Coe 

on-Factor 

ficients 


Communality 


c, 

C 2 

Original 

(1) 

Calculated 

(2) 

(1) - (2) 

1 

.793 

-.569 

.718 

.953 

- 235 

1 

.673 

-.341 

.494 

.569 

-.075 

J 

.638 

-.230 

.414 

.460 

-.046 

4 

.892 

.402 

.945 

.957 

-.012 

5 

.654 

.368 

.546 

.563 

- 017 

6 

.505 

.371 

.363 

.393 

-.030 

Total 

4.155 

.001 

3.480 

--- 

3.895 

-.415 

Contribution 






of factor 

2.967 

.928 

— 

— 

— 

Percent of total 






orig. comm. 

85.3 

26.7 

— 

111.9 

-11.9 
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PROBLEMS AND EXERCISES 


CHAPTER 9 

1. For the simple case of m = 1 formula (9.9) becomes 


f, = I (fjk - 


k = 1 
k*j 


Then, expanding the quadratic, this becomes 

fj = h b J ~ 2 QPj + C ' 

or, by completing the square, 

fj = 4 - ?'f + K > 


(j fixed) 


where 


e? 


!. = £af, Oj = SoA> c i = I r i- and K i- C > l 

and it is understood that all summations extend over the range k . = 1 to n but 
not equaUo j. From the last formula for f t it is seen that tts mtmmum value 
occurs when b, = Q /!,. Thus, the minres solution is given by. 


Oj i, 


1 L 
v 


bj = 1 if 


Qj 

h 

Qj 


> l. 


? fat Denoting the original loadings by a, and the minres loadings by w i th a 
( * superscript for the iteration number, the solutions are given ,n the following 

table: 


j 

a i 

b? 

bf 

bf 

1 

2 

1.05 

.90 

1.000 

.918 

1.000 

.913 

1.000 

.912 

3 

4 

.80 

.70 

.810 

.706 

.809 

.707 

.809 

.707 

5 

.60 

.604 

.605 

.605 
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(b) For the Hey wood solution (with the communality of the first variable being 
greater than one) all the residuals vanish and the objective function is 
precisely zero. The minres solution yields an / = .004441 from the residuals 
in the following table: 



3. 


m 

U m for N equal to: 


z 2 

Action regarding 

12 50 100 200 

V 

0-1% 1% 5% 

null hypothesis 

1 

47.5 211.8 427.9 860.1 

10 

29.6 23.2 18.3 

Reject at all levels, all N. 

2 

3.8 17.1 34.6 69.6 

6 

22.5 16.8 12.6 

For N = 12, accept at all 
levels; for N = 50, 
accept at 0.1 % level, 
reject at 5 % level; for 

N > 50, reject at all 
levels. 

3 

0 0 0 0 

3 

16.3 11.3 7.8 

Accept at all levels, all N. 


4. 


j 


f 2 

hj 

1 

.860 

-.017 

.740 

2 

.834 

-.156 

.720 

3 

.865 

-.377 

.890 

4 

.577 

.396 

.490 

5 

.495 

.339 

.360 

6 

.330 

.226 

.160 

Variance 

2.870 

.490 

3.360 
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PROBLEMS AND EXERCISES 


CHAPTER 10 

i. (a) l = fi /<*.; p) = n p x, (i - p)' ' x '=i ^'* 1 - d N ~* x ' 

i= 1 i=l 

In the example, N = 100 and ]T X* = 18, so that the likelihood function 
becomes: L = p 18 (l — p) 82 . 


(b) logL = 18 log p + 82 log (1 - p) 

/ d(logL) 18 82 

C dp ~ p 1 - P 

(d) 18(1 - £) - 82p = 0, p = 0.18. 

2. (a) 

First Iteration for Maximum-Likelihood Estimates of General-Factor Weights 
for Five Psychological Tests of Table 7.1 


Line 

Instruction 

1 

2 

3 

4 

5 

1 

2 

a n 

d) = \- af, 

.707 

.5002 

.673 

.5471 

.604 

.6352 

.554 

.6931 

.398 

.8416 

3 

Li/L 2 

1.413 

1.230 

.951 

.799 

.473 

4 

5 

rl 3 

L,-L t 

2.847 

2.140 

2.727 

2.054 

2.402 

1.798 

2.259 

1.705 

1.611 

1.213 

6 

a ji = L 5 /s/l 3 ■ l 5 

.706 

.677 

.593 

.562 

.400 

7 

d) = 1 - Ll 

.5016 

.5417 

.6484 

.6842 

.8400 


The uniquenesses are in line 2 of the above table, and the first iteration is 
contained in lines 3-7. The square-root of the inner product of the vectors 

in line 3 and line 5 is x /L 3 • L 5 = 3.033. 

(b) The convergence is considered satisfactory after four iterations when four 
of the five weights have stabilized to three decimal places, and the coefficient 
for test 2 differs by one unit in the third decimal place from its preceding 
value. The maximum-likelihood estimates of the five coefficients are. 
.705, .682, .587, .566, .399. 

3. (a) The work can be organized conveniently as in Tables 10.4 and 10.5, and the 
required sum is 

£ f%/djd} = .007626. 

j<k= 1 
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ip = .46) if the real descripi 

common factor. . ^cautions, the data are 

of accuracy). The rep { he following table. 

residuals in the lower triangle .-- 


-.00143 
.00002 
.00021 
.00046 


.01118 

.03768 

-.02317 

-.00115 


.97243 

.11660 

-.00528 

-.01236 


43866 

.71458 

.52000 

.00997 


.02195 

.86422 

.13429 

76768 


- _—i-- —Z"^ 221306. From (10.28), the 

* less should not be applied to cases o ^ ^ 

large-samp § 0 f the first principal factor:(ex^ interm ediate 

Starting with the coe T ass i st the reader, s 

* “- «na. factor we.gh. 



-1—-— 1 ' tV _~ residuals that at least a 

It should be immediately^obvious^pon com^ simply as an 

tomula (10 - m “ is f0 

i {ryi}ii) = I ' 477737 
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problems and exercises 


nflr Touj ' since /v — snc tl • 

rejected Mdaf i = ? degrees of freedom w”hom° US,) ' ^ Va,ue of X 2 falls 
the co rre ,r on a ‘ kast a s “ond factor sZ^ZZT" 

g First Iteration for M ■ an adequate fit t0 

*‘™ate s of Two Fac(ors for ^ 

Line ' ~ -- 


Instruction 


a j2 


aj 2 


Li/L 3 

l 2 /l 3 

RL d 


/L 4.1 


RL< 


Uj 2 


‘ Lg)L 8 

1 A/ZTTi 


.793 

■508 

■ 1131 

7.011 

4.492 

19.683 

18.890 

• 688 

• 581 
.073 
4.376 


.682 

.362 

•4038 


•651 

.243 


.882 


.646 

•381 

•4375 


13 




■658 


L? 


•0937 


1.689 
f -896 

1 1 

O' 0 

*/”> r- 
<N ^ 1 

1 20.464 
j -9.814 

1.477 

-.871 

17.872 

17.190 

18.241 

17.590 

26.737 

25.855 

1 19.889 

1 9.243 

.626 

•641 I 

' -942 

J .701 

•262 

-.100 

3.815 

-1.193 | 

-1.436 
2.573 

-8.074 

-7.651 

-1.760 

-6.536 

-6.155 

-1.771 

•573 

•387 J 

-.265 

— .266 j 


•6076 


•4394 


•0424 


16.102 

15.601 

.568 


-5.831 
- 5.455 
-1.903 


•4378 


SsSS^^SPSssssg 

p “" ^ m >»■ 
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10 . 


Variable 

j 


Initial Values 


After 1 Iteration 


After 2 Iterations 


After 3 Iterations 


1 

2 

3 

4 

5 

6 

7 

8 


.858 

.849 

.810 

.825 

.747 

.637 

.561 

.619 


-.328 

-.414 

-.412 

-.339 

.561 

.507 

.488 

.371 


.890 

.886 

.855 

.863 

.675 

.574 

.506 

.566 


-.210 

-.325 

-.313 

-.237 

.648 

.573 

.573 

.404 


.894 

.895 

.864 

.871 

.657 

.556 

.491 

.553 


-.183 

-.300 

-.290 

-.210 

.670 

.586 

.592 

.413 


.894 

.896 

.866 

.871 

.656 

.554 

.490 

.551 


-.182 

-.301 

-.291 

-.209 

.663 

.577 

.587 

.403 


11. For n = 8 and m 


2, the number of degrees of freedom is v = 13. The sum of 
squared residuals divided by the product of uniquenesses is 

£ r%/d]d 2 k = .26 3, 

j<k= 1 

and for N = 305 formula (10.29) gives U 2 = 80.2. Since this value is so much in 
excess of* = 27.7, corresponding to the 1 % level of significance, the hypothesis 
rejected. An interpretation from the viewpoint of goodness of fit may be put 

^<^Tr din f'° < = ?°- 2 With V = 13 ’ the acc °“Panying probability 
fi, t,'? ’ 1S ’ 1” SS than 1 out of 1000 trials - in random sampling, can a 

fit as bad or worse than that observed be expected if the real description of the 

data were actually in terms of two common factors. Clearly the actual fit is a 
bad one, and another hypothesis is indicated. 


12. For m — 2 and v = 13, when N = 120, then 


U, 


31.6. 


120 (.263) 

This value is still in excess of the critical value of 27.7, and the hypothesis would 

t b heT- e i 8 fTT"’ f °. r m = 3 and r = 7, t/ 3 = 120(.151) = 18.1 is just below 
the x - 18.5 which produces a probability P = .01, and it may be said that three 

mTrf r e JUSt S1 f lficant at the 1 % level in describing the correlations 
ot lable 10.2 (if they were based on only 120 cases). 

13. For m = 2 and v = 13, when N = 100, then 

U 2 = 100(.263) = 26.3. 

Since this value is just below that required to produce P = .01, the hypothesis 
would not be rejected, i.e., two factors would suffice at the 1 % level Of course 
if 3 factors were obtained then the U 3 = 15.1 for v = 7 would certainly be 
significant at the 1 / Q level (actually P = .04 for this x 2 ). 
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CHAPTER 11 




3. r TiT2 = 5.686/(3.659)(3.201) = .4855. 

4. The factor structure appears as the 8 x 2 matrix in the lower left corner of the 
answer to ex. 5. 

5. The oblique factor pattern is the 8 x 2 matrix, calculated in the right-hand block 
of the table, while the orthogonal factor matrix is obtained in the middle block. 


Variable 

Original Matrices 

Square Root Operation 

Row-by-Row 
Multiplication with T~ 1 

T 

t 2 

1.0Q00 

* 

1.0000 


1.0000 

0 

.4855 

1.0000 

.4855 

.8742 

-.0000 

1.0000 

T 

t 2 

1.0000 

0 

1.0000 

-.5554 

1.3085 

* 

0 

1.0000 

0 

1.1439 

-.6353 

1.3085 

1 

2 1 

.916 

.485 

.916 

.046 

.890 

.053 

.939 

.435 

.939 

-.024 

.952 

-.027 

3 

.903 

.400 

.903 

-.044 

.927 

— .050 

4 

.902 

.455 

.902 

.020 

.891 

.023 

5 

.455 

.935 

.455 

.817 

.001 

.935 

6 

.375 

.803 

.375 

.710 

-.019 

.812 

7 

.312 

.761 

.312 

.697 

-.075 

.797 

8 

.412 

.702 

.412 

.574 

.093 

.657 

Total 

Check 

7.699 

7.461 

7.699 

7.699 

4.259 

4.259 

5.333 

5.334 

4.873 

4.872 


6. The reproduced correlations in the table were obtained by row-by-row multiplica¬ 
tion of the orthogonal factor matrix A by itself. These numbers may be checked by 
row-by-row multiplication of the oblique structure S by the oblique pattern P, 
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or in reverse order. 



7. The residuals are recorded in the table. Without applying any approximate 
statistical tests (see ex. 12, chap. 8) it seems reasonable to conclude that these 
residuals are random deviations around zero. Two multiple-group factors account 
for 99.93 % of the original communality of 5.960. 



8- 4, = E T \J N = E(Zli + Z 2< + z 3,m 

= £z?i/N + E>ii/iV + 14 ! n + 2(I z u z 2i/ N + EW* + £ z 2l z 3I /JV) 

= s? + si + si + 2 (r, 2 + r ,3 + r 23 ) 

3 

s ri = X r jk , where r n = 1 or r H = h? depending on the model. 

M= i 

CHAPTER 12 


.990 

.016 

-.130 - 

.044 

.257 

.961 

.076 - 

.012 

.456 

.112 

.878 

.026 

.386 

.250 

.109 

.881 

1 13 = 

.597, 

b 14 = -067, 



b 53 =-. 016, b 24A = . 195. 
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3- b u =.096, b X2 =.248, b l3 =.706; 

^ 13,1 = -070, b X32 == -619, b x33 = .469. 

These values agree perfectly with those obtained in Table 12.5 (by successive 
rotations in a plane). 

5. It should be noted that variables 4, 5, 6 fall on a straight line. Hence, one axis 
can be passed through these three points and the other at right angles to it. The 
direction of the second axis may have to be reversed in order to make the algebraic 
signs of the factor loadings agree with those in Table 5.8. 


CHAPTER 13 

1. There is generally close agreement between corresponding values in the two 
solutions, with a number being identical. The largest discrepancy in structure 
values is .008 and the largest in pattern coefficients is .011. 

2. s vt = 2.843, s V2 = 4.214, s V3 = 3.147. 

3. u x = .6535 C x + .0735C 2 - .5012C 3 
u 2 = M4SC X + .3759C 2 + .2330C 3 
u 3 = .6841 C x - .5697C 2 + .1405C 3 

4. D(Ou x ) = .8268, D(Ou 2 ) = .9536, D(Ou 3 ) = .9013. 

5- [ .7904 .8859 .7590' 

T= .0889 .3942 -.6321 . 

.-.6062 .2443 .1559- 


6 . 

7. 


cp = X'T = 


S = CT = 


1.000 

.587 

.449' 

.587 

1.000 

.461 

.449 

.461 

1 . 000 . 

.743 

.406 

.430' 

.445 

.264 

.204 

.604 

.324 

.157 

.559 

.386 

.266 

.451 

.807 

.429 

.489 

.807 

.339 

.447 

.846 

.355 

.537 

.717 

.420 

.435 

.841 

.311 

.075 

.311 

.712 

.302 

.357 

.677 

.316 

.222 

.724 

.582 

.419 

.723 


8. The primary-factor pattern P for these data is exhibited as the matrix A in 
Table 12.1. 
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9. The total direct contributions are given in the diagonal and the total joint 
contributions below the diagonal of the table: 



The grand total of the contributions is 6.981 as compared with the centroid total 
communality of 6.966. 

10. W = V*F -1 

= V(A'A)" 1 from (13.35) 

= VA~ 1 (A')" 1 from (3.16) 

= A(A') -1 from (13.30). 

CHAPTER 14 

1 . Q — 3.983 for the graphical solution of Table 12.3 (for a graphical solution based 
on a centroid pattern, which was given in the first edition of this text, Q = 4.061), 
while the maximum Q — 4.091 is for the quartimax solution of Table 14.2. 

2. (a) Quartimax solution for eight political variables: 



(b) Q = 4.040 for the quartimax solution; Q = 4.009 for the principal-factor 
solution. 

3. V = 23.61 for quartimax solution; V = 23.75 for graphical solution; V = 24.30 
for varimax solution. 

4. (a) Varimax solution for eight political variables: 
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(b) V = 9.58 for principal-factor solution; V = 15.95 for varimax solution. 
5- (a) (V - S) rms = .040; (b) (V - Q) rms = .052; (c) (Q - S) rms = .091. 



CHAPTER 15 

(b) r = [".4344 .0799" 

L-0799 ,3375_ 

(c) Nj - J1\9N X + .1402 = 0; 
N x = .2924. 

(d) X n = .4904, X 2l = -.8715 



C = 


1.3451 

-.3479 


-.3478“ 
.2925 _ 


Nj - 1.6376N 2 + .2724 = 0 
N 2 = .1879 

X l2 = .2878, X 22 = .9576. 


(g) A r .5784 -.2659" 
L —.8158 — .9640_ 
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4. (a) A —! 


1.0030 

-.54221 

(b) T - 1 

" .8797 

.8396' 

_ .9574 

.6194 J 

1 — 

_—.4755 

.5432_ 


( c) D = T'A = 


.877 .000 
.000 .877. 


(d) p» _ jj- ly' _ 


.893 

.052 


.955 

-.027 


.931 .878 .008 
-.050 .031 .928 


(e) r Tl T 2 — -4803. 

5. t _ (.9022 — .4881 \ 

A ~ \ .6767 .7709/' 

The matrix for normalizing the rows of A" 1 is: 


-.023 -.057 .082 

.824 .767 .685 


6 . 


_ | .97488 0 

U ” 0 .97488 


and 


The primary-factor pattern is given by: 


p = VD 1 = 


.8601 

.6181 


.5231 \ n 
.7945)’ 


.9934 

0 


p = VD 1 = 


D --! 1 ^ 577 


.0979 

1.0111" 

.9600 

-.1171 

.0271 

.9832 

.7893 

.3610 

.9876 

-.1175. 

° 1 

; D = 

.9934 

.0368 

.9974 ' 

.9461 

-.0673 

.0856 

.9761 

.8053 

.3980 

.9736 

.0662 _ 


o 

1.02577 ' 


1.0066 

0 


0 \ 

1.0066/’ 


7. From the solution of ex. 5, the first variable may be expressed as follows: 


z\ = -.0919T, + 1.0111T 2 + .11041/ x , 


and its common-factor variance is given by 


h\ = (—.0979) 2 + (1.01 ll) 2 + 2(-.0979)(1.0111)r TlT2 


where the correlation between the two oblique factors is required. This may be 


obtained by use of (15.42), 

T' = DA -1 = 


.8795 

.6597 


— .4758\ 
.7515/’ 


and then from (13.16) it follows that r TlTl — .2226. Hence, 
h\ = .0096 + 1.0223 - .0441 = .9878. 
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13. (a) F 0 = 0. 

(b) The structure matrix is obtained by postmultiplying the pattern matrix by 
the factor correlation matrix, according to (2.43). The result is 

~ .97015 -.78011" 

.48508 -.39005 . 

.-.51085 .63530, 

14. F 0 = .83956. The computer program calculates the sum over the factors only 
under the constraint that the indices for the factors be different. Formula (15.45) 
requires that one index always be less than the other. Hence, the computer output, 
F 0 = 1.679149, is just twice that calculated by (15.45). 

15. In terms of the expression (15.45), but with each b jp replaced by b jp /hj, the initial 
value is .976 and the final value is .138. The complete direct oblimin (<5 = 0) 
solution consists of the following: 



Factor Pattern 

Factor Structure 


b n 

b j2 

r jT l 

r j'r 2 

1 

-.07316 

1.01078 

.12028 

.99678 

2 

.88580 

-.08021 

.87045 

.08931 

3 

.05653 

.96704 

.24160 

.97786 

4 

.76056 

.34397 

.82638 

.48953 

5 

1.00303 

-.10971 

.98203 

.08224 



Factor Correlations 

T t 



1 . 

.19137 

t 2 



.19137 

1 . 


CHAPTER 16 


1. In formula (16.12) the matrix A consists of the first two principal components from 
Table 8.1, the matrix B comes from Table 14.6, and the matrix A m is the diagonal 
matrix of the first two eigenvalues retained. The key parts of the computations 
include: 


B'A = 

1 2.35810 —1.02656) 

\ 1.64172 1.47451/’ 

A 2 - 

1 8.25591 0 \ 

i\ 2 — 

\ 0 3.22799/’ 

B'AA2 2 =| 

1 .28562 —.31801) 


1-19885 .45678/' 

Finally, postmultiplying the last matrix by A' produces the identical values for 
the coefficients of the z’s as those arrived at in the text by use of (16.7). 
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2. F x = 26z x + .14z 2 + .60z 3 , R x = .96, 

F 2 = .31z! + .53 z 2 - .16z 3 , R 2 = .67. 

3 Following the instructions in Table 3.3, place the identity matrix to the right of 
' R, apply the square root operation of Table 16.3 to this identity matrix, and then 
column-by-column multiplication of the resulting matrix yields. 


R ’ 1 = 


5.444 

- 2.080 

- .374 
- 2.432 

- .845 

- .385 
.515 
.277 


- 2.080 

6.632 

- 3.362 

- 1.005 

1.004 

.019 

- .355 

- .881 


- .374 
- 3.362 

5.007 

- .912 

- .446 

- .096 
.394 
.267 


- 2.432 

- 1.005 

- .912 
4.728 

- .245 
.424 

- .430 
.075 


- .845 
1.004 

- .446 

- .245 
3.984 

- 1.559 

- 1.485 

- .654 


- .385 
.019 

- .096 
.424 

- 1.559 

2.529 

- .131 

- .391 


.515 

- .355 
.394 

- .430 
- 1.485 

- .131 
2.289 

- .253 


.277 

.881 

.267 

.075 

.654 

.391 

.253 

1.913 


4 t* — (.484 .435 .399 .454 .932 .813 .740 .724)R 1 {z l z 2 ---z 8 } 

= —.042Zj + .131z 2 - .069z 3 + .021z 4 + .612z 5 + .199z 6 + 0.77z 7 

-f- .162zg. • • i 

The result is identical with the second equation in (16.36) except for trivial 

differences in the third decimal place. 


5. To prove that L is a symmetric matrix it is sufficient to show that its transpose 
is equal to the matrix itself, namely, 

V = (0)- 1 + A'D~ 2 A)' = (0)- 1 )' + (A)'(D~ 2 )'(A')' = 0* 1 + A'D~ 2 A = L, 
since (and hence <D _ 1 ) is symmetric, as is also the diagonal matrix D 2 . 


6. F x = .26 z x + .13 z 2 + .62z 3 , R x = .96, 
F 2 = .3 lz t + .54z 2 - .16z 3 , R 2 = .67. 


7. 


E" 1 = 


2061 

0 


.0089” 
.2864_ • 


f.248 .367 .226 .185 .020 .003 -.001 .011 

B ' = |_042 .000 .011 .023 .578 .200 .140 .120J 

This answer is based upon a computer output which produced the quartimin 
reference structure of Table 15.3, and therefore may vary slightly from the results 
that will be obtained by the method suggested in the exercise. 


-.038 -.012 ••• .035 

.014 .008 ••• .113 

.261 .102 ••• -.003 

-.010 -.004 ••• .080_ 

(b) By formula (16.25), employing the primary-structure values (which would 
have to be determined); or by formula (16.27) involving the complete correla¬ 
tion matrix. 


9. (a) 

B' = 
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10. The computations in Table 16.9 are based on an approximation to R _1 while 
the coefficients in exercise 3 were determined from the actual R" 1 for the observed 
correlations. The two sets of values would agree perfectly if the residual matrix 
were the null matrix. 

11. Of course the individual entries in these two 8x8 matrices differ considerably 
since they were derived by very intricate and distinct computations starting from 
different initial values. Each inverse when multiplied by the original matrix upon 
which it is based, either R or (R f + D 2 ), will produce the identity matrix. But 
since the original matrices differ to the extent that the residual matrix does not 
vanish, the respective inverses must differ. Nonetheless, reasonable approxima¬ 
tions to some of the desired statistics can be made from the approximate R _1 
given in Table 16.9 (see ex. 22, chap. 5). 

12. r'S = J -1 C'S, postmultiplying (16.58) by S, 

= J _ substituting for S from (2.43), 

= J *P'D 2 Pd>, substituting for C from its definition, 

= J -1 Jd>, from the definition of J, 


13. (a) Fj = .065z! - .530z 2 + 1.465z 3 , 

F 2 = 1.337ZJ + 3.820z 2 - 4.155z 3 . 

(b) R j = yi - .790/3.113 = .864, 

R 2 = Vi - 15.333/3.113 = imaginary; the measure of information is only 
3.113/15.333 = .203 instead of a value greater than unity. 


' .065 

-.530 

1.465" 

-.90 .60 

1.337 

3.820 

— 4.155_ 

.85 .65 
_.95 .55 

1.000 

.500" 



. .500 

1.000 • 
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Statistical Tables 


eas the work of Spearman and Holzmge [443] onthe computing 


- the work of Spearman^^and computing 
differences. When Hotelling [ p ° included discussions of the 

Cpl^ 

TnS rnajo” contrib" 'the^atisS theory of factor amdysis, when Lawley 
[320] introduced the maximum ' ikdlh °°d H^/cSfficients and 

o“TdditLnal assumptions which led to the 

required. In Table A ^standard error of 

given for samples from N - 20 to standard error of a factor 

to r = 75 For the same range of values of N and r, tne stanucuu 

co t!S 

the normal curve are given in Tab . . , « s hown for values of x/s 

from Chto ^t^^^taHlHy of exceeding a deviation of ±x/s is 

giV , e nT b abfe D^ ? distribution ,s presented for a few selected values of P, where 


P(X 2 ) = 


-1 


(x 2 ) ,v 


2’ ,2 [(v 


2»2 e -xV2 

2 )/ 2 ]! 


than, the value actually obtained. 
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Table A 


Standard Errors of Residuals with One Factor Removed 
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Table B 

Standard Errors of Factor Coefficients 
o a = iy/(3/r - 2 - 5 r + 4r 2 )/N 
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Table C 

Area under the Normal Curve 


x/s 

= —i=- f 

. /2ns J 


e ~U x ls )2 dx 
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Index to Illustrative Examples 


Variables 3 

Data b 

Direct Solutions 3 

Derived Solutions' 1 

Factor 

Measure¬ 

ments 

5 Hypothetical 

117 

D: 117; Min: 391,414-15; 
T-F: 117 



5 Psychological 

116 

M-L: 391-92,416-17; 

T-F: 116 



5 Socio-economic 

14, 397, 
425-26 

Min: 204, 391,415; 

M-L: 229, 392,417; 

PC: 137 ;PF: 161-63 

D-Obl: 340, 398, 427; 
Obl-B: 396-97, 425; 
Obl-Q: 396, 425; 

Var:309-10 

349, 359-60, 
398, 427 

6 Hypothetical 

43,88,89 

D: 92; Min: 391, 415; 

PF: 390,411 

Subj: 394, 422 


8 Emotional 

164 

Min: 206-7 ;PF: 163-65 



8 Physical 

80,81, 
147-54, 
222,357, 
358, 386, 
394, 395, 
407, 422, 
423-24 

M-G: 393, 420-21; 

Min: 204-6; 

M-L: 222-29,392-93,419; 
PF: 146-55, 390,412 

D-Obl: 336; Obl-B: 328; 
Obl-C: 328; 

Obl-Q: 322, 328, 
396-97,425-26; 

Qmax: 302-3; 

Subj: 262-64, 277-88; 
Var:307-9 

355-57, 361, 

365-68, 

371-73, 

398-400, 

428-29 

8 Political 

165-66, 
385, 407 

PF: 165-66 

Binorm: 332; 

D-Obl: 340; 

Obl-B: 332; 

Obl-C: 332; 

Obl-Q: 332, 395-96, 
424; Qmax: 395,423; 
Var: 395, 424 


9 Psychological 

244 

M-G: 244-46 



13 Psychological 

178, 394, 
422 

C: 177-86; M-L: 230 

Subj: 252, 264-67, 394, 
422 


24 Psychological 

124-5, 
128, 130, 
132-33, 
330, 331, 
339, 393- 
94, 421 

BiF: 129; Min: 207-10; 
M-L: 231; PC: 167-68; 
PF: 168-70 

D-Obl: 338; Obl-B: 329, 
397, 426; Omax: 319; 
Qmax: 305, 311; 

Subj: 311; Var: 311 

399, 428 


a Problems and exercises are excluded, unless they are used in the text. 

b Includes raw data, correlations, estimates of communality, and some other statistics that are not expressly 
part of a particular factor solution. 


Code for direct solutions: 

BiF = Bi-factor 
C = Centroid 

D = Directly from factor model 
M-G = Multiple-group 
Min = Minres 
M-L = Maximum-likelihood 
PF = Principal-factor 
PC = Principal components 
T-F = Two-factor 


Code for derived solutions: 

Binorm = Binormamin 
D-Obl = Direct Oblimin 
Obl-B = Biquartimin 
Obl-C = Covarimin 
Obl-Q = Quartimin 
Omax = Oblimax 
Qmax = Quartimax 
Subj = Subjective (graphical) 
Var = Varimax 


Oblimin 

types 
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Ahmavaara, Yrjo, 268, 374 
Albert, A. Adrian, 62, 78, 84 
Alker, Hayward, 7 

Analytical methods for multiple-factor solution: 
criteria for, see: 

Binormamin 

Covarimin 

Direct oblimin 

Oblimax 

Oblimin 

Quartimax 

Quartimin 

Varimax 

oblique case, 314-41 
orthogonal case, 293-313 
rationale for, 294-97 
Anderson, T. W., 7, 16, 24, 136 
Andrews, T. G., 7 
Angle: 

between two lines, 55-58, 65 
in general Cartesian coordinates, 58-60 
Apostol, Tom M., 138 
Applications of factor analysis, 6-8 
Aubert, Eugene J., 7 
Axioms for Euclidean geometry, 46 

Baggaley, A., 345 
Banachiewicz, T., 101 
Bargmann, R., 188, 227 
Barlow, J. A., 268, 269 
Bartlett, M. S., 11, 197, 220, 369, 372 
Basis for a space, 52 
B-coefficients: 
aid in computation, 119-20 
calculation of, 127-28 
definition of, 118-19 
Benoit, Commandant, 101 
Berry, Brian J. L., 7 
Bi-factor solution : 
computing procedures, 124-32 
form of, 105 

theoretical development, 121-23 
Binormamin criterion, 326 
Bi-orthogonal system of coordinate axes, 289, 334 
Bipolar factors, 100, 153-55, 165-66, 169 
Biquartimin criterion, 325-26 
Bliss, G. A., 8-9 
Bodewig, E., 30 
Boldt, Robert F., 188 
Bonner, R. E., 7 
Borko, H., 7 
Breuel, H., 7 
Burns, Leland S., 7, 13 


Burt, Cyril, 3,100,146,163,164,171,233,268,269, 
270, 298, 334, 374 

Cady, Lee D., 7 
Canonical correlation, 8 
Canonical factor analysis (of Rao), 219 
Canonical form of factor solution, 169-71, 195 
Carroll, John B., 293, 295, 298, 299, 314, 318, 320 
324, 325, 326, 327, 368 
Cartesian coordinate system, 47 
Cattell, Raymond B„ 7, 268, 271, 345, 374 
Center of gravity, 171. See also Centroid 
Centroid: 

at origin in (m - 1) space, 174 
coordinates of, 172-73 
distance from origin, 172, 174 
formula, 173 

Centroid method: See Centroid solution 
Centroid solution, 4, 5, 171-86 ! 

computing procedures, 177-84 
form of, 101, 108 
theoretical development, 171-77 
Characteristic equation, 139 
Characteristic root. See Eigenvalue 
Characteristic vector. See Eigenvector 
X 2 statistic, 197, 219-21 
Cholesky, A. L., 101 
Christian, P., 7 
Clelland, Richard C., 7 
Cluster analysis, 233 
Coefficient of congruence, 269-72 
Common-factor space: 
definition of, 63 
determination of, 69-72 
independent of frame of reference, 94 
theorem on, 63 
Common factors: 
conditions for one, 72-74 
conditions for two, 75-77 
conditions for more than two, 77 
definition of, 15 

portions of unit variances factored into, 28 
Communality: 
algebraic solution for, 78 
arbitrary approximations: 
approximate rank, 84 
average of all correlations, 83 
highest correlation, 83 
triad, 83 
unit-rank, 84 

as by-product of maximum-likelihood solution, 
103 

as by-product of minres solution, 104, 189 


467 




INDEX 


Communality —Continued 
complete approximations: 
first averoid factor, 85 
first centroid factor, 84 
iteration by refactoring, 85 
squared multiple correlation (SMC), 86 
definition of, 17 

from rank of correlation matrix, 77-81 

lower bound for, 87 

“observed,” 86 

problem of, 68-92 

proper, 82 

theoretical solution, 81-83 
Complexity of variable, 20, 95,297 
Component, distinguished from factor, 136 
Component analysis, 15, 100, 108, 136-37, 219, 
228,348-50 

Composite variables, 236, 278 
Composition of variance, 16-19 
Computers, vii, 5, 29, 86, 99, 103, 155-61, 188, 
200-3,211, 227-31, 294, 314, 345 
CDC 3200, 202; CDC 6600,167, 203 
CRC 102-A, 457 
GE 625, 167, 203 

IBM 701, 303; IBM 704,156,166, 304, 327, 332, 
368, 443, 459; IBM 709, 463; IBM 7090, 
441, 442; IBM 7094, 167, 327, 360, 442; 
IBM 7044, 202, 335; IBM System/360, 
167, 203, 327 
Illiac, 219, 230, 303, 318 
Johnniac, 156, 447 
Ordvac, 156, 166, 465 
Philco 2000,161, 202, 203, 232, 335 
Whirlwind 1,230 
Comrey, Andrew L., 188 
Congruent factors, 271—72 
Contribution of factors to variance of variables. 
See Factor contribution to variance of 
variables 

Contributions of variables to variance of factor, 
353, 358 
Cooley, D. S., 7 
Coombs, Clyde H., 196 
Coordinate axes, 47 
Coordinate hyperplanes, 47 
Coordinate system. See Cartesian coordinate 
system 
Coordinates: 
definition of, 47 
factor coefficients as, 64-67 
Copernican theory, 9 
Correlation: 

as cosine of angle between lines, 62 
between variable and its “common” part, 81 
between variables in different spaces, 66 
definition of, 13 
derived from factor model, 22 
distinction between observed and reproduced, 
22 

reproduced {see Reproduced correlation) 
residual {see Residual correlation) 

Correlation matrix: 

among common factors, 26 

among variables constitute primary data, 21 


conditions for reduced rank, 72-77 
Gramian properties, 82 
matrix rotation, 27-28 

number of linearly independent conditions, 72 
rank of, 4, 63 

rank under assumption of arbitrary correlation, 
70-71 

singular, 207' 

Cosine of angle between lines: 
and correlation coefficient, 62 
derivation of formula for, 57-58 
in different spaces, 66 
Covariance, 13 

Covarimin method, bias in, 325, 326; criterion, 
325 

Creager, John A., 7, 369 
Cureton, Edward E., 9 
Cureton, T. K., 7 

Daly, Howell V., 7 
Danford, M. B., 199 
Degan, J. W., 7 

Degree of factorial similarity, 270,272 
Degrees of freedom of rotation in m-space, 259 
Derby, R. C., 7 
Determinant: 
basic concepts of, 30-33 
cofactor, 31 
definition of, 30 
distinguished from matrix, 33 
elements of, 30 
expansion of, 31 
minor, 31 

principal diagonal, 30 

representation by elements of principal 
diagonal, 75 
Diagonal matrix, 35 
Diagonal method, 102 
Dickman, Kern W., 326 
Dickson, L. E., 69 
Difference of matrices, 34 
Dimensionality, 46 

Direct contributions of factors. See Factor contri¬ 
bution to variance of variables, direct 
Direct factor solution, 90-92 
Direct oblimin: 

comparison with biquartimin and quartimin, 
337-41 

computer program, 335 
criterion, 335 
definition of, 335 
method, 334-41 
properties of, 336-41 

relationship between direct and indirect 
methods, 341 

Direction angles of a line, 55 
Direction cosines: 
definition, 55 
in different spaces, 66 

of final reference axes with respect to original 
axes, 258 
property of, 55 
Direction numbers, 56 
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Direction ratios, 60 
Distance: 

in general Cartesian coordinates, 58-60 
m rectangular coordinates, 52-53 
Distinction between coordinate and correlation, 

Distinction between determinant and matrix 33 
Distinction between primary and reference axes, 

Distinction between statistical and “practical” 
significance, 198 P actlcaJ 

Distribution function, 197 213 7is 
Doolittle, M. H., 37 ’ ’ 

Doolittle solution, 37 
Dot product, 35, 53 
Doublet, 131 

Dwyer, Paul S„ 7, 29, 82, 86, 102, 368, 369, 372 
Eckart, C., 188 

Eigenvalues, 140-46,156-61, 171 198 199 

Eigenvectors, 140-46, 156-61 171 199 

Electronic computers. See Computers 
Empirical tests of significance, 198 
Error factors, 18 
Error variance, 18-19 

l1ima.o°”2° f 2-54° rS ' Measureme "‘ “fetors 
Euclidean geometry, 9, 46-47 

Factor. See Bipolar, Common, Doublet, Error 

FactorSaTyS Gr ° UP ’ SpeCifc - U '' i ‘>“ e ’ 

applications of, 6-8 
history of, 3-5 

mathematical solutions contrasted with 

statistical, 211-12 

matrix formulation of, 4,24-28 
objectives of, 4-5, 14, 187 
staEs^tica^techniques and computations, 7, 

subspaces employed in, 63-67 
See also Factor solutions 

a dk 0 e r ct C0 2 n 84 ibUti0n l ° variance of variables: 
joint, 284 
total, 17, 138, 284 
Factor loading, 15, 318, 348 
ambiguous use, 290 318 
sign changes, 153 
Factor matrix rotator, 261 
Factor matrix V, ambiguous use, 290. See also 

Factor modelT mul,, P le - fact ° r ' 
general description, 14-16 
linear constraint, 11 
matrix notation, 24-28 
statistical fit of, 21-23 
Factor pattern: 

coefficients different from elements of structure, 

definition of, 20 
in matrix form, 25 


Factor solutions 
For derived solutions, see • 

Binormamin 

Biquartimin 

Covarimin 

Direct oblimin 

Oblimax 

Oblimin 

Oblique multiple-factor 

Quartimax 

Quartimin 

Subjective 

Varimax 

For direct solutions, see: 

Bi-factor 

Centroid 

Component analysis 
Maximum-likelihood 

Minres 

Multiple-factor 

Principal-factor 

Two-factor 

comparisons, 108, 249, 268-72 
criteria for choice, 94-99 
direct, 90-92 
indeterminacy of, 23-24 
properties of, 93-109 

^factors loS 16 ° f nUmber of com ™on 

re< 99-103 estimates of c °mmunalities, 
simple models, 113-34 
summary of, 107-9 
r actor structure: 
definition of, 20 

hl e matrix l form^ 26 ^ r0mPOe ^ C * eiltSOf * :,attern ’^* 
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